j 7 ht://Dig: Configuration file attributes  1

Configuration file format -- Attributes



= ht://Dig Copyright © 1995-2000 The ht://Dig Group
8 Please see the file COPYING for license information.




" Alphabetical list of attributes




+ % add_anchors_to_excerpt



 type:

boolean

 used by:

5 htsearch

 default:

 true

 description:

7 If set to true, the first occurrence of each matched4 word in the excerpt will be linked to the closest6 anchor in the document. This only has effect if the: EXCERPT variable is used in the output> template and the excerpt is actually going to be displayed.

 example:

 add_anchors_to_excerpt: no






< allow_in_form



 type:

 string list

 used by:

5 htsearch

 default:

 <empty>

 description:

> Allows the specified config file attributes to be specified< in search forms as separate fields. This could be used to= allow form writers to design their own headers and footers= and specify them in the search form. Another example would7 be to offer a menu of search_algorithms in the form. 
6  <SELECT NAME="search_algorithm">
`  <OPTION VALUE="exact:1 prefix:0.6 synonyms:0.5 endings:0.1" SELECTED>fuzzy
3  <OPTION VALUE="exact:1">exact
  </SELECT>

 example:


; allow_in_form: search_algorithm search_results_header






< allow_numbers



 type:

boolean

 used by:

! htdig

 default:

 false

 description:

5 If set to true, numbers are considered words. This7 means that searches can be done on number as well as8 regular words. All the same rules apply to numbers as# to words. See the description of= valid_punctuation for the* rules used to determine what a word is.

 example:

 allow_numbers: true






( " allow_virtual_hosts



 type:

boolean

 used by:

! htdig

 default:

 true

 description:

8 If set to true, htdig will index virtual web sites as1 expected. If false, all URL host names will be8 normalized into whatever the DNS server claims the IP5 address to map to. If this option is set to false,3 there is no way to index either "soft" or "hard" virtual web sites.

 example:

 allow_virtual_hosts: false






"  authorization



 type:

string

 used by:

! htdig

 default:

 <empty>

 description:

( This tells htdig to send the supplied- username:password0 with each HTTP request. The credentials will. be encoded using the "Basic" authentication- scheme. There must be a colon (:)) between the username and password.
2 This attribute can also be specified on htdig's0 command line using the -u option, and will be/ blotted out so it won't show up in a process5 listing. If you use it directly in a configuration5 file, be sure to protect it so it is readable only6 by you, and do not use that same configuration file for htsearch.

 example:

' authorization: myusername:mypassword






$  backlink_factor



 type:

number

 used by:

5 htsearch

 default:

 1000

 description:

: This is a weight of "how important" a page is, based on3 the number of URLs pointing to it. It's actually; multiplied by the ratio of the incoming URLs (backlinks)7 to outgoing URLs (links on the page), to balance out6 pages with lots of links to pages that link back to8 them. The ratio gives a lower weight to "link farms",7 which often have many links to them. This factor can7 be changed without changing the database in any way.8 However, setting this value to something other than 0' incurs a slowdown on search results.

 example:

 backlink_factor: 501.1






#  bad_extensions



 type:

 string list

 used by:

! htdig

 default:

: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg= .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi

 description:

1 This is a list of extensions on URLs which are7 considered non-parsable. This list is used mainly to: supplement the MIME-types that the HTTP server provides: with documents. Some HTTP servers do not have a correct2 list of MIME-types and so can advertise certain7 documents as text while they are some binary format.= See also valid_extensions.

 example:

! bad_extensions: .foo .bar .bad






!  bad_querystr



 type:

 string list
T
 used by:
i
! htdigl

 default:
r
 <empty>e
d
 description:
o
: This is a list of CGI query strings to be excluded from? indexing. This can be used in conjunction with CGI-generated"3 portions of a website to control which pages are< indexed.o
2
 example:
h
s
dG bad_querystr: forum=private section=topsecret&passwd=required
p






< bad_word_list

o

 type:>
f
string

 used by:
r
% htdig andt5 htsearcht
w
 default:
t
 ${common_dir}/bad_words
o
 descriptions:i
t
: This specifies a file which contains words which should: be excluded when digging or searching. This list should8 include the most common words or other words that you6 don't want to be able to search on (things like 5 sex or smut are examples of these.)
>6 The file should contain one word per line. A sample= bad words file is located in the contrib/examplese directory.

 example:
o

u/ bad_word_list: ${common_dir}/badwords.txtg
m
n



l

0 bin_dir

e

 type:l
t
stringw
n
 used by:
c
" htdig,( htnotify,& htfuzzy,) htmerge andc5 htsearchc
0
 default:
z
BIN_DIR
L
 description:
L
1 This is the directory in which the executables 6 related to ht://Dig are installed. It is never used8 directly by any of the programs, but other attributes' can be defined in terms of this one.

9 The default value of this attribute is determined at> compile time.

<
r
 example:
a
 bin_dir: /usr/local/bin

p




' >! build_select_lists"

a

 type:u

 quoted string list

 used by:

5 htsearch
a
 default:

 <empty>s
u
 description:

4 This list allows you to define any htsearch input3 parameter as a select list for use in templates,- provided you also define the corresponding / name list attribute which enumerates all thel1 choices to put in the list. It can be used fors0 existing input parameters, as well as any you0 define using the allow_in_form attribute. The3 entries in this list each consist of an octuple,0 a set of eight strings defining the variables0 and how they are to be used to build a select0 list. The attribute can contain many of these/ octuples. The strings in the string list are1 merely taken eight at a time. For each octuple . of strings specified in build_select_lists,+ the elements have the following meaning:U
    a1
  1. the name of the template variable to bee defined as a listd2
  2. the input parameter name that the select list will setn0
  3. the name of the user-defined attribute containing the name list4
  4. the tuple size used in the name list above.
  5. the index into a name list tuple for the value<2
  6. the index for the corresponding label on the selector/
  7. the configuration attribute where the , default value for this input parameter is defined,
  8. the default label, if not an empty1 string, which will be used as the label ford3 an additional list item for the current inputd3 parameter value if it doesn't match any value in the given list
. See the select1 list documentation for more information on this attribute.
h
 example:

 . m < o
 build_select_lists:n r7 MATCH_LIST matchesperpage matches_per_page_list \
s2 1 1 1 matches_per_page "Previous Amount" \
? RESTRICT_LIST restrict restrict_names 2 1 2 restrict "" \
i8 FORMAT_LIST format template_map 3 2 1 template_name ""
o
s
g


a

> case_sensitive

m

 type:<
>
boolean
r
 used by:
f
! htdig>

 default:

 true

 description:
<
7 This specifies whether ht://Dig should consider URLs = case-sensitive or not. If your server is case-insensitive, ) you should probably set this to false.

 example:
t
 case_sensitive: false

t


m
e
r6 common_dir

a

 type:o

stringc
e
 used by:
f
" htdig,( htnotify,& htfuzzy,) htmerge ande5 htsearch

 default:

COMMON_DIR

 description:
l
8 Specifies the directory for files that will or can be7 shared among different search databases. The default<7 value for this attribute is defined at compile time.

 example:

 common_dir: /tmpt




<

% . common_url_parts.

m

 type:

 string list

 used by:
e
" htdig,( htnotify,) htmerge ands5 htsearch
c
 default:
v
9 http:// http://www. ftp:// ftp://ftp. /pub/ .html .gift7 .jpg .jpeg /index.html /index.htm .com/ .com mailto:o

 description:
/
0 Sub-strings often found in URLs stored in the6 database. These are replaced in the database by an/ internal space-saving encoding. If a stringsA specified in url_part_aliases,:/ overlaps any string in common_url_parts, the * common_url_parts string is ignored.
0 Note that when this attribute is changed, the3 database should be rebuilt, unless the effect of/2 "changing" the affected URLs in the database is wanted.
p
 example:
s
 e r e
a common_url_parts:w n$ http://www.htdig.org/ml/ \
 .html \
! http://dev.htdig.org/ \
e http://www.htdig.org/
<
u
o


c
;
qD compression_level

r

 type:m
_
numbera

 used by:
m
! htdig

 default:
r
 0
.
 description:
"
 If specified and the zlib 3 compression library was available when compiled,m this attribute controls+ the amount of compression used in the doc_db file. Defaults to zero to3 provide backward compatility with old databases.
s
 example:
b
 compression_level: 6>
>
u


s
>
f6 config_dir

d

 type:

string>
a
 used by:
>
" htdig,( htnotify,& htfuzzy,) htmerge andl5 htsearchs
<
 default:
p
CONFIG_DIR
r
 description:
>
9 This is the directory which contains all configurationf. files related to ht://Dig. It is never used8 directly by any of the programs, but other attributes2 or the include directive' can be defined in terms of this one.m

9 The default value of this attribute is determined atd compile time.


 example:

 config_dir: /var/htdig/conf
d
e


a
h
s& create_image_list

l

 type:>
i
boolean

 used by:

! htdig

 default:

 false
_
 description:

6 If set to true, a file with all the image URLs that9 were seen will be created, one URL per line. This list 5 will not be in any order and there will be lots oft9 duplicates, so after htdig has completed, it should bed7 piped through sort -u to get a unique list.t
/
 example:
i
 create_image_list: yesi
w
d



m
$  create_url_list

m

 type:
o
boolean
e
 used by:
s
! htdign
b
 default:
i
 false

 description:
w
: If set to true, a file with all the URLs that were seen8 will be created, one URL per line. This list will not8 be in any order and there will be lots of duplicates,3 so after htdig has completed, it should be piped1 through sort -u to get a unique list.l
l
 example:

 create_url_list: yes>
a
a



l
< database_base



 type:m
e
string

 used by:
>
" htdig,@ htnotify, 9 htfuzzy, htmerge and htsearch

 default:

 ${database_dir}/dbu
d
 description:

8 This is the common prefix for files that are specific6 to a search database. Many different attributes use3 this prefix to specify filenames. Several search : databases can share the same directory by just changing( this value for each of the databases.

 example:
t
' database_base: ${database_dir}/sales_
<



r
A
b: database_dir

L

 type:e

string
o
 used by:
<
" htdig,( htnotify,& htfuzzy,) htmerge andb5 htsearch

 default:

 DATABASE_DIRd
>
 description:

8 This is the directory which contains all database and4 other files related to ht://Dig. It is never used8 directly by any of the programs, but other attributes$ are defined in terms of this one.

9 The default value of this attribute is determined at> compile time.


 example:

 database_dir: /var/htdigc
a
>


d

<8 date_factor

m

 type:
"
number,
f
 used by:
"
5 htsearch

 default:
<
 0

 description:
C
 This factor, like backlink_factor can bei9 changed without modifing the database. It gives higherm: rankings to newer documents and lower rankings to older9 documents. Before setting this factor, it's advised to 6 make sure your servers are returning accurate dates1 (check the dates returned in the long format).d9 Additionally, setting this to a nonzero value incurs a_ performance hit on searching.

 example:
t
 date_factor: 0.35




u
<
8 date_format

m

 type:.
g
stringf
h
 used by:
c
5 htsearch/
.
 default:
p
 <empty>.
o
 description:
>
6 This format string determines the output format for9 modification dates of documents in the search results. 7 It is interpreted by your system's strftimes6 function. Please refer to your system's manual page4 for this function, for a description of available6 format codes. If this format string is empty, as it is by default, 5 htsearchl2 will pick a format itself. In this case, the iso_8601 attribute can be used( to modify the appearance of the date.
l
 example:
"
 date_format: %Y-%m-%d

r


d
m
F description_factor



 type:

numberd

 used by:
s
! htdig
<
 default:

 150
e
 description:

; Plain old "descriptions" are the text of a link pointing : to a document. This factor gives weight to the words of8 these descriptions of the document. Not surprisingly,9 these can be pretty accurate summaries of a document's.C content. See also title_factor or text_factor. Changing thist. factor will require updating your database.
>
 example:
o
 description_factor: 350




<

. doc_db



 type:s
m
string_
r
 used by:

: htdig, 9 htmerge and f htsearch,
<
 default:

 ${database_base}.docdb<
e
 description:
h
: This file will contain a Berkeley database of documents2 indexed by URL. It contains all the information6 gathered for each document, so this file can become2 rather large if 4 max_head_length is set to a large value.

 example:
e
#
/* doc_db: ${database_base}documents.db
v
s
e


c
t
<4 doc_index



 type:d
e
string<
h
 used by:
_
) htmerge and 6 htsearch,

 default:
e
 ${database_base}.docs.index
t
 description:
f
8 This file will contain a Berkeley database which maps8 document numbers to document URLs. It is basically an6 intermediate database from the word database to the document database.R

 example:

doc_index: documents.index.db
a
e


d
h
2 doc_list

e

 type:m
e
stringl
>
 used by:
n
! htdigs
n
 default:

 ${database_base}.docs
e
 description:

4 This file is basically a text version of the file: specified in doc_db. Its7 only use is to have a human readable database of all7 documents. The file is easy to parse with tools liket perl or tcl.
l
 example:
i
doc_list: /tmp/documents.text
p



p
i
b: end_ellipses

d

 type:

string>
a
 used by:
l
5 htsearchs
<
 default:
p
/ <b><tt> ...</tt></b>
m
 description:
f
9 When excerpts are displayed in the search output, this : string will be appended to the excerpt if there is text6 following the text displayed. This is just a visual8 reminder to the user that the excerpt is only part of the complete document.
_
 example:
r
 end_ellipses: ...

m



c
t"  end_highlight

e

 type:s
m
stringh
t
 used by:
/
5 htsearcht

 default:

 </strong>
t
 description:
i
4 When excerpts are displayed in the search output,* matched words will be highlighted using1 start_highlightt* and this string. You should ensure that0 highlighting tags are balanced, that is, this0 string should close any formatting tag opened by start_highlight.
<
 example:
c
 end_highlight: </font>t





d
d' ! endings_affix_filew

a

 type:
t
stringr
y
 used by:
a
% htfuzzye
l
 default:

 ${common_dir}/english.aff
<
 description:
b
8 Specifies the location of the file which contains the: affix rules used to create the endings search algorithm* databases. Consult the documentation onA htfuzzy for more information on the  format of this file.m
e
 example:
r
a
0 endings_affix_file: /var/htdig/affix_rules
d
<





' r! endings_dictionaryd



 type:r
a
stringo
c
 used by:
v
% htfuzzy
t
 default:
g
 ${common_dir}/english.0
n
 description:
r
8 Specifies the location of the file which contains the9 dictionary used to create the endings search algorithm* databases. Consult the documentation onA htfuzzy for more information on the format of this file.t

 example:
h

d/ endings_dictionary: /var/htdig/dictionarya
m
e
/


t
g
) e# endings_root2word_dbm

s

 type:o
d
stringh

 used by:
t
) htfuzzy andm5 htsearch
e
 default:

 ${common_dir}/root2word.dbr
h
 description:
k
8 This attributes specifies the database filename to be4 used in the 'endings' fuzzy search algorithm. The8 database maps word roots to all legal words with that8 root. For more information about this and other fuzzy! search algorithms, consult ther8 htfuzzy documentation.
' Note that the default value uses thed@ common_dir attribute instead of the6 database_dir attribute.3 This is because this database can be shared with> different search databases.
f
 example:

p
/- endings_root2word_db: /var/htdig/r2w.db
h




t
r) # endings_word2root_dbr

.

 type:e
t
string
e
 used by:

) htfuzzy and/5 htsearchd
r
 default:

 ${common_dir}/word2root.db
a
 description:

8 This attributes specifies the database filename to be4 used in the 'endings' fuzzy search algorithm. The: database maps words to their root. For more information8 about this and other fuzzy search algorithms, consult) the htfuzzy documentation.
' Note that the default value uses the@ common_dir attribute instead of the6 database_dir attribute.3 This is because this database can be shared with. different search databases.
e
 example:
e
x
>- endings_word2root_db: /var/htdig/w2r.bm
e
>
t



r
t# b excerpt_length 

e

 type:<
c
number
n
 used by:
<
5 htsearch

 default:

 300
>
 description:

9 This is the maximum number of characters the displayed : excerpt will be limited to. The first matched word will? be highlighted in the middle of the excerpt so that there ise some surrounding context.
& The  start_ellipses and@ end_ellipses are used to6 indicate that the document contains text before and, after the displayed excerpt respectively.> The start_highlight: and end_highlight7 are used to specify what formatting tags are used tod highlight matched words.t

 example:
>
 excerpt_length: 500




t
s
d%  excerpt_show_topa

e

 type:

boolean
l
 used by:
i
5 htsearcha
r
 default:

 false
s
 description:

: If set to true, the excerpt of a match will always show8 the top of the matching document. If it is false (the9 default), the excerpt will attempt to show the part ofe8 the document that actually contains one of the words.
m
 example:
/
 excerpt_show_top: yes
l
>


m
e
h: exclude_urls



 type:&

 string list
i
 used by:
e
! htdig

 default:

 /cgi-bin/ .cgii
i
 description:
t
9 If a URL contains any of the space separated patterns,4 it will be rejected. This is used to exclude such5 common things such as an infinite virtual web-tree< which start with cgi-bin.
>
 example:
a
& exclude_urls: students.html cgi-bin
:



r

% t external_parsersh

"

 type:
u
 quoted string list
g
 used by:
i
! htdigp
l
 default:
s
 <empty>f
g
 description:
g
. This attribute is used to specify a list of4 content-type/parsers that are to be used to parse9 documents that cannot by parsed by any of the internal4 parsers. The list of external parsers is examined9 before the builtin parsers are checked, so this can bed1 used to override the internal behavior withoutd recompiling htdig.
g2 The external parsers are specified as pairs of0 strings. The first string of each pair is the4 content-type that the parser can handle while the9 second string of each pair is the path to the externalu9 parsing program. If quoted, it may contain parameters,/ separated by spaces.
i3 External parsing can also be done with externali0 converters, which convert one content-type to2 another. To do this, instead of just specifying, a single content-type as the first string0 of a pair, you specify two types, in the form6 type1->type2,0 as a single string with no spaces. The second+ string will define an external converterd- rather than an external parser, to convert . the first type to the second. If the second. type is user-defined, then/ it's up to the converter script to put out a<5 "Content-Type: type" header followedg5 by a blank line, to indicate to htdig what type it5 should expect for the output, much like what a CGIs3 script would do. The resulting content-type must 2 be one that htdig can parse, either internally,4 or with another external parser or converter.
0 Only one external parser or converter can be+ specified for any given content-type.

) The two main internal parsers are fort3 text/html and text/plain. There is also a simpled. parser for application/pdf, described under1 pdf_parser, which ist1 quite limited and is typically overridden with an external one.

. The parser program takes four command-line2 parameters, not counting any parameters already# given in the command string:
: infile content-type URL configuration-file
 < n i d    r e f d r k t s l o   u 
 Parametert  Descriptiond Exampleo
m infile 6 A temporary file with the contents to be parsed.  /var/tmp/htdext.14242n
 content-type $ The MIME-type of the contents.  text/html
r URLi  The URL of the contents. % http://www.htdig.org/attrs.html
 configuration-file ' The configuration-file in effect.>  /etc/htdig/htdig.conff

2 The external parser is to write information for0 htdig on its standard output. Unless it is an3 external converter, which will output a document>4 of a different content-type, then its output must( follow the format described here.
: The output consists of records, each record terminated5 with a newline. Each record is a series of (unless>< expressively allowed to be empty) non-empty tab-separated0 fields. The first field is a single character9 that specifies the record type. The rest of the fields>% are determined by the record type.p s  < "! m o >  t u i  s a! a i s d x o  n i "  / i ,  m e  !   h r x m t n >
 Record type Fields  Description<
/ w t word , A word that was found in the document.
i location 4 A number indicating the normalized location of5 the word within the document. The number has to7 fall in the range 0-1000 where 0 means the top of the document.z
t heading level 1 A heading level that is used to compute the 4 weight of the word depending on its context in7 the document itself. The level is in the range ofd& 0-10 and are defined as follows:
 0
 Normal text
_ 1
 Title textt
/ 2
 Heading 1 textt
c 3
/ Heading 2 text
d 4
c Heading 3 text
 5
 Heading 4 texte
" 6
d Heading 5 text>
 7
< Heading 6 text
T 8
a unused
t 9
l unused
r 10t
u Keywords>
t
l ud i document URL - A hyperlink to another document that ist5 referenced by the current document. It must beh; complete and non-relative, using the URL parameter toi< resolve any relative references found in the document.
> hyperlink description/ 0 For HTML documents, this would be the text2 between the <a href...> and </a> tags._
e t / title>  The title of the document>
_ ha < head 5 The top of the document itself. This is used ton1 build the excerpt. This should only containe normal ASCII texta
a a p anchor 5 The label that identifies an anchor that can bem2 used as a target in an URL. This really only% makes sense for HTML documents.l
s in < image URL> 4 An URL that points at an image that is part of the document.i
i mt < http-equiv # The HTTP-EQUIV attribute of ad. META tag. May be empty.i
e name The NAME attribute of this. META tag. May be empty.
e contents $ The CONTENTS attribute of this. META tag. May be empty.l
<#

See also FAQ questions 4.8 and 4.9 for more examples.

i

 example:
d
> g h
 external_parsers:f m/ text/html /usr/local/bin/htmlparser \
7 application/pdf /usr/local/bin/parse_doc.pl \
aM application/msword->text/plain "/usr/local/bin/mswordtotxt -w" \
fD application/x-gunzip->user-defined /usr/local/bin/ungzipper
c
s



n
b
i* $ extra_word_characters

h

 type:
p
string
e
 used by:
p
% htdig ande5 htsearch<
x
 default:
a
 <empty>c

 description:
c
2 These characters are considered part of a word.' In contrast to the characters in the5 valid_punctuation/ attribute, they are treated just like letter characters.
7 Note that the locale attributee1 is normally used to configure which characters constitute letter characters.
s
 example:
c
 extra_word_characters: _b
<
d


n

t5 heading_factor_1 -u heading_factor_6p

u

 type:h
n
numbera

 used by:
b
! htdig
s
 default:

 heading_factor_1: 5
 heading_factor_2: 4
 heading_factor_3: 3
 heading_factor_4: 1
 heading_factor_5: 1
 heading_factor_6: 0
h
 description:
d
6 This is a factor which will be used to multiply the5 weight of words between <h1> and </h1>e8 tags. It is used to assign the level of importance to: certain headers. Setting a factor to 0 will cause words5 in this heading to be ignored. The number may be a& floating point number. See also the/ title_factor andd5 text_factor attributes.

 example:
t
 heading_factor_1: 7.75
 heading_factor_2: 5.3
. heading_factor_3: 2
> heading_factor_4: 0
" heading_factor_5: 0
 heading_factor_6: 0
f
e


>

$  htnotify_sender



 type:

string/
o
 used by:

' htnotify-
/
 default:
e
 webmaster@www

 description:
>
7 This specifies the email address that htnotify emailr: messages get sent out from. The address is forged using4 /usr/lib/sendmail. Check htnotify/htnotify.cc for detail on how this is done.
t
 example:

s s
m. htnotify_sender: bigboss@yourcompany.com
e
t
p


d
f
d6 http_proxy

f

 type:r

string

 used by:

! htdig
t
 default:

 <empty>
/
 description:
/
0 When this attribute is set, all HTTP document9 retrievals will be done using the HTTP-PROXY protocol.i9 The URL specified in this attribute points to the hostc/ and port where the proxy server resides.
w: The use of a proxy server greatly improves performance of the indexing process.t

 example:

 t
e0 http_proxy: http://proxy.bigbucks.com:3128
g
o
i



o
sF http_proxy_exclude

c

 type:

 string list
<
 used by:

! htdig
d
 default:

 <empty>d
d
 description:
t
8 When this is set, URLs matching this will not use the9 proxy. This is useful when you have a mixture of sitesH+ near to the digging server and far away.
d
 example:


<2 http_proxy_exclude: http://intranet.foo.com/
/
<


9
<
6 image_list

>

 type:<

string

 used by:
l
! htdigi
u
 default:
e
 ${database_base}.images
c
 description:

: This is the file that a list of image URLs gets written0 to by htdig when the? create_image_list is set toy: true. As image URLs are seen, they are just appended to6 this file, so after htdig finishes it is probably a3 good idea to run sort -u on the file tox& eliminate duplicates from the file.

 example:

 image_list: allimages
i
e



>
rB image_url_prefix



 type:t

stringt
u
 used by:

5 htsearch
"
 default:
n
 IMAGE_URL_PREFIXd
>
 description:
t
7 This specifies the directory portion of the URL used 8 to display star images. This attribute isn't directly7 used by htsearch, but is used in the default URL for/ the star_image andd7 star_blank attributes, ands8 other attributes may be defined in terms of this one.

9 The default value of this attribute is determined at compile time.


 example:
a
" image_url_prefix: /images/htdig
<



d

0 include



 type:
t
stringa

 used by:

" htdig,( htnotify,& htfuzzy,) htmerge ande5 htsearch>
Q
 description:

3 This is not quite a configuration attribute, but/0 rather a directive. It can be used within one3 configuration file to include the definitions of 4 another file. The last definition of an attribute6 is the one that applies, so after including a file,0 any of its definitions can be overridden with2 subsequent definitions. This can be useful when5 setting up many configurations that are mostly the 7 same, so all the common attributes can be maintainedg9 in a single configuration file. The include directives 2 can be nested, but watch out for nesting loops.
a
 example:
d
$ include: ${config_dir}/htdig.conf
d
m



>
n2 iso_8601



 type:
"
boolean
a
 used by:
u
< htsearch and htnotify

 default:
p
 false
t
 description:
n
7 This sets whether dates should be output in ISO 8601sE format. For example, this was written on: 1998-10-31 11:28:13 EST.l See also the date_format attribute, whichc$ can override any date format that5 htsearch> picks by default.
6 This attribute also affects the format of the date7 htnotify expects to findg7 in a htdig-notification-date field.

 example:
t
 iso_8601: true

<


t

<$  keywords_factor



 type:
c
numberi
:
 used by:
i
! htdig>
h
 default:
/
 100
f
 description:
i
6 This is a factor which will be used to multiply the9 weight of words in the list of keywords of a document.s: The number may be a floating point number. See also the/ title_factor and 4 text_factorattributes.
o
 example:
f
 keywords_factor: 12

a




g, & keywords_meta_tag_names



 type:
_
 string list
e
 used by:

! htdigo
<
 default:

 keywords htdig-keywords

 description:
y
9 The words in this list are used to search for keywords-6 in HTML META tags. This list can contain any7 number of strings that each will be seen as the namem/ for whatever keyword convention is used.
6 The META tags have the following format:
J  <META name="somename" content="somevalue">y
c
 example:
/
e s
3 keywords_meta_tag_names: keywords description
y
/





B limit_normalized

a

 type:>

 string list
d
 used by:
m
! htdigr
t
 default:
m
 <empty>
e
 description:
s
9 This specifies a set of patterns that all URLs have toa8 match against in order for them to be included in the; search. Unlike the limit_urls_to directive, this is donei< after the URL is normalized and the server_aliases8 directive is applied. This allows filtering after any: hostnames and DNS aliases are resolved. Otherwise, this" directive is the same as the limit_urls_to directive.

 example:

d t
x
a
>


d

<< limit_urls_to

m

 type:
"
 string list

 used by:

! htdigd

 default:
d
 ${start_url}L
t
 description:
w
9 This specifies a set of patterns that all URLs have toa8 match against in order for them to be included in the2 search. Any number of strings can be specified,: separated by spaces. If multiple patterns are given, at6 least one of the patterns has to match the URL.
: Matching is a case-insensitive string match on the URL9 to be used. The match will be performed afterd9 the relative references have been converted to a validd: URL. This means that the URL will always start with http://.
f7 Granted, this is not the perfect way of doing this,f4 but it is simple enough and it covers most cases.

 example:

limit_urls_to: .sdsu.edu kpbs
f
L


<
"
lD local_default_doc



 type:r
n
 string list
f
 used by:
u
! htdiga
e
 default:
<
index.htmlt
>
 description:
e
? Set this to the default documents in a directory used by thep6 server. This is used for local filesystem access to: translate URLs like http://foo.com/ into something like /home/foo.com/index.html
>. The list should only contain names that the3 local server recognizes as default documents for3 directory URLs, as defined by the DirectoryIndexn- setting in Apache's srm.conf, for example. 0 As of 3.1.5, this can be a string list rather3 than a single name, and htdig will use the firstU/ name that works. Since this requires a loop,t2 setting the most common name first will improve5 performance. Special characters can be embedded ina& these names using %xx hex encoding.
n
 example:

 n  
o local_default_doc: 6 default.html default.htm index.html index.htm
i
h



>
>
6 local_urls

>

 type:t

 string list

 used by:

! htdigi
<
 default:

 <empty><
e
 description:
h
; Set this to tell ht://Dig to access certain URLs throughp: local filesystems. At first ht://Dig will try to access4 pages with URLs matching the patterns through the8 filesystems specified. If it cannot find the file, or; if it doesn't recognize the file name extension, it willn: try the URL through HTTP instead. Note the example--the; equal sign and the final slashes in both the URL and thes directory path are critical.e:
The fallback to HTTP can be disabled by setting the1 local_urls_only  attribute to true. 2 To access user directory URLs through the local filesystem, set2 local_user_urls.5 The only file name extensions currently recognizede5 for local filesystem access are .html, .htm, .txt,d5 .asc, .ps, .eps and .pdf. For anything else, htdig=3 must ask the HTTP server for the file, so it can) determine the MIME content-type of it./1 As of 3.1.5, you can provide multiple mappings / of a given URL to different directories, andr/ htdig will use the first mapping that works. ( Special characters can be embedded in& these names using %xx hex encoding.4 For example, you can use %3D to embed an "=" sign in an URL pattern.<
/
 example:
s
f a
e6 local_urls: http://www.foo.com/=/usr/www/htdocs/
d
/
u


d
a
$  local_urls_only



 type:m
a
boolean
n
 used by:
>
! htdig

 default:

 false
/
 description:

1 Set this to tell ht://Dig to access files onlyf2 through the local filesystem, for URLs matching the patterns in the* local_urls or1 local_user_urls 1 attribute. If it cannot find the file, it willh# give up rather than trying HTTP. , This will not affect files outside of the1 scope of local_urls and local_user_urls, which 0 will still be fetched by HTTP. To disable all2 non-local fetching of files, you'll need to set) the start_url 1 and limit_urls_too/ attributes to allow only URLs covered by ther local filesystem.

 example:

 local_urls_only: true
/
/


<
<
<@ local_user_urls

r

 type:
m
 string list

 used by:
u
! htdigf
l
 default:
d
 <empty>
t
 description:
r
; Set this to access user directory URLs through the localt; filesystem. If you leave the "path" portion out, it willt= look up the user's home directory in /etc/password (or NISs> or whatever). As with local_urls, if the files are note6 found, ht://Dig will try with HTTP. Again, note the? example's format. To map http://www.my.org/~joe/foo/bar.html 8 to /home/joe/www/foo/bar.html, try the example below.:
The fallback to HTTP can be disabled by setting the1 local_urls_only attribute to true.y1 As of 3.1.5, you can provide multiple mappingst
 example:
u

;6 local_user_urls: http://www.my.org/=/home/,/www/
e
p
h


m
a
r. locale

s

 type:f

string
a
 used by:
s
! htdigi
s
 default:

 C
i
 description:

3 Set this to whatever locale you want your search 3 database cover. It affects the way internationalo7 characters are dealt with. On most systems a list of>6 legal locales can be found in /usr/lib/locale. Also5 check the setlocale(3C) man page._
r
 example:
d
 locale: en_US
/





0 logging

t

 type:m
e
boolean
a
 used by:
r
' htsearche
p
 default:
a
 false
c
 description:
c
< This sets whether htsearch should use the syslog() to log9 search requests. If set, this will log requests with a>> default level of LOG_INFO and a facility of LOG_LOCAL5. For? details on redirecting the log into a separate file or otherr7 actions, see the syslog.conf(5) manL> page. To set the level and facility used in logging, change< LOG_LEVEL and LOG_FACILITY in the include/htconfig.h file before compiling.
s
h; Each line logged by htsearch contains the following:p
2 REMOTE_ADDR [config] (match_method) [words]2 [logicalWords] (matches/matches_per_page) - page, HTTP_REFERERl

/ where any of the above are null or empty, it>0 either puts in '-' or 'default' (for config).
n
 example:
y
 logging: true
f
t


e
d
m6 maintainer



 type:

stringu
s
 used by:
i
! htdigt
L
 default:

bogus@unconfigured.htdig.user
l
 description:
n
4 This should be the email address of the person in8 charge of the digging operation. This string is added3 to the user-agent: field when the digger sends a  request to a server.n
d
 example:

$ maintainer: ben.dover@uptight.com
m
i


e
a
e: match_method



 type:

stringn

 used by:
l
5 htsearchl
.
 default:
<
 and

 description:

8 This is the default method for matching that htsearch uses. The valid choices are:p
  •  or
  • / and
  • e boolean
  • 
9 This attribute will only be used if the HTML form thatu9 calls htsearch didn't have the method value set.
m
 example:

 match_method: boolean
L
p


t
h
i% t matches_per_page 

i

 type:
n
number
i
 used by:
n
5 htsearch

 default:
l
 10l
T
 description:
l
3 If this is set to a relatively small number, the 9 matches will be shown in pages instead of all at once.s

 example:
<
 matches_per_page: 999
y
e


e
s
l+ F% max_description_lengthh

h

 type:M
-
numberf

 used by:
e
! htdigg
h
 default:
a
 60b

 description:
a
( While gathering descriptions of URLs,8 htdig will only record those8 descriptions which are shorter than this length. This1 is used mostly to deal with broken HTML. (If aw4 hyperlink is not terminated with a </a> the9 description will go on until the end of the document.)
>
 example:
l
 max_description_length: 40>

p




: max_doc_size

t

 type:
m
number/
d
 used by:
d
! htdigt
l
 default:
h
100000,
a
 description:
c
: This is the upper limit to the amount of data retrieved0 for documents. This is mainly used to prevent6 unreasonable memory consumption since each document4 will be read into memory by htdig.c
s
 example:

 max_doc_size: 5000000

e


"
u
_$  max_head_length



 type:s
/
numberm
e
 used by:
y
! htdig>
<
 default:
e
 512
s
 description:

: For each document retrieved, the top of the document is5 stored. This attribute determines the size of this<8 block. The text that will be stored is only the text; no markup is stored.
7 We found that storing 50,000 bytes will store about>3 95% of all the documents completely. This reallyS8 depends on how much storage is available and how much you want to show.
p
 example:
u
 max_head_length: 50000

v


c
"
l< max_hop_count

o

 type:t
.
number
m
 used by:
o
! htdige
g
 default:
r
999999i
e
 description:
p
2 Instead of limiting the indexing process by URL8 pattern, it can also be limited by the number of hops9 or clicks a document is removed from the starting URL.e: Unfortunately, this only works reliably when a complete' index is created, not an update.
, The starting page will have hop count 0.
a
 example:
o
 max_hop_count: 4g
w
/


b

: max_keywords

"

 type:d

number:

 used by:

! htdig

 default:

 -1 (no limit)t
>
 description:
n
1 This attribute can be used to limit the numbery2 of keywords per document that htdig will accept2 from meta keywords tags. A value of -1 or less4 means no limit. This can help combat meta keyword/ spamming, by limiting the amount of keywordso3 that will be indexed, but it will not completely 0 prevent irrelevant matches in a search if the2 first few keywords in an offending document are not relevant to its contents.
n
 example:
>
 max_keywords: 10
/



e

0 * max_meta_description_length

>

 type:
f
number
m
 used by:

! htdigs
g
 default:

 512
l
 description:
F
; While gathering descriptions from meta description tags, / htdig will truncate 2 descriptions which are longer than this length.
h
 example:
i
$ max_meta_description_length: 1000




h

t' ! max_prefix_matchest



 type:p

integer
>
 used by:
u
5 htsearch.
n
 default:
y
 1000i

 description:
<
7 The Prefix fuzzy algorithm could potentially match an4 very large number of words. This value limits the. number of words each prefix can match. Note8 that this does not limit the number of documents that are matched in any way.

 example:

 max_prefix_matches: 100
s
/


s
n
4 max_stars



 type:h
s
numberr
l
 used by:

5 htsearch

 default:
d
 4
d
 description:
=
7 When stars are used to display the score of a match,>9 this value determines the maximum number of stars thatr can be displayed.
m
 example:

 max_stars: 6=
a



<
u
"  maximum_pages

t

 type:m
a
integer
e
 used by:

5 htsearch

 default:
i
 10f
o
 description:
r
: This value limits the number of page links that will be8 included in the page list at the bottom of the search9 results page. Note that this does not limit the number , of documents that are matched in any way.
_
 example:

 maximum_pages: 20
/



d

<( " maximum_word_length



 type:e
t
number

 used by:
t
% htdig andm5 htsearcht
a
 default:
p
 12t
>
 description:
e
5 This sets the maximum length of words that will bem9 indexed. Words longer than this value will be silentlyo8 truncated when put into the index, or searched in the index./
d
 example:
m
 maximum_word_length: 15
"
l



>
d, & meta_description_factor



 type:n

numberl

 used by:
c
! htdigs
o
 default:
k
 50d
;
 description:

6 This is a factor which will be used to multiply the> weight of words in any META description tags in a document.: The number may be a floating point number. See also the/ title_factor ands5 text_factor attributes.p

 example:

 meta_description_factor: 20

f



d
d: metaphone_db

>

 type:c

stringl

 used by:

) htfuzzy andy5 htsearchm

 default:
d
${database_base}.metaphone.db
>
 description:
e
: The database file used for the fuzzy "metaphone" search) algorithm. This database is created by>1 htfuzzy and used by:6 htsearch.
<
 example:
<

>* metaphone_db: ${database_base}.mp.db
m
n



r
o
t: method_names

z

 type:i
d
 quoted string listi
r
 used by:
w
5 htsearcha
p
 default:
o
! and All or Any boolean Boolean
m
 description:
a
/ These values are used to create the <8 method menu. It consists of pairs. The first8 element of each pair is one of the known methods, the7 second element is the text that will be shown in thee8 menu for that method. This text needs to be quoted if it contains spaces.. See the select1 list documentation for more information oni how this attribute is used.
<
 example:
i
 method_names: or Or and And
n
l


o
o
a* $ minimum_prefix_length

e

 type:e

numberl
o
 used by:
e
5 htsearch
<
 default:
s
 1
"
 description:

= This sets the minimum length of prefix matches used by the = "prefix" fuzzy matching algorithm. Words shorter than this' will not be used in prefix matching./
d
 example:
d
 minimum_prefix_length: 2d
>
d


t

T( " minimum_word_length

t

 type:e
e
number.
e
 used by:
i
% htdig ande5 htsearch
a
 default:

 3
n
 description:
d
5 This sets the minimum length of words that will be1: indexed. Words shorter than this value will be silently. ignored but still put into the excerpt.
: Note that by making this value less than 3, a lot more9 words that are very frequent will be indexed. It might/5 be advisable to add some of these to the bad_wordsm list.
/
 example:
i
 minimum_word_length: 2d
>
t




- ' modification_time_is_now 



 type:l

boolean
t
 used by:
<
! htdigx
i
 default:

 false

 description:
"
: This sets ht://Dig's response to a server that does not2 return a modification date. If false, it stores> nothing. By setting modification_time_is_now, it will store3 the current time if the server does not return ah: date. Though this will return incorrect dates in search; results, it may cut down on reindexing from such serverss4 when doing updates, provided they still honor the4 If-Modified-Since header. Caching servers such as& WWWoffle and Squid seem to do this.

 example:
a
! modification_time_is_now: true
e
i




a# m next_page_textd

d

 type:
a
string_
s
 used by:
>
5 htsearchl

 default:

[next]h
=
 description:
<
8 The text displayed in the hyperlink to go to the next page of matches.e
t
 example:
p
i
s: next_page_text: <img src="/htdig/buttonr.gif">
e
t
d




( " no_excerpt_show_top

a

 type:>

boolean

 used by:

5 htsearchm
_
 default:

false

 description:
s
7 If no excerpt is available, this option will act thee same as excerpt_show_top, that is,( it will show the top of the document.
l
 example:
m
 no_excerpt_show_top: yes>
a
>



m
0$  no_excerpt_text

x

 type:n
r
stringd

 used by:

5 htsearch

 default:
n
8 <em>(None of the search words were found in the$ top of this document.)</em>

 description:
s
9 This text will be displayed in place of the excerpt if : there is no excerpt available. If this attribute is set4 to nothing (blank), the excerpt label will not be displayed in this case.

 example:

 no_excerpt_text:m
h



>
>
4 noindex_start,0 noindex_end



 type:

string

 used by:

! htdig

 default:

9 <!--htdig_noindex--> <!--/htdig_noindex-->

 description:

L The text encompassing a section of an HTML file that should be completelyF ignored when indexing. As in the defaults, this can be SGML commentI declarations that can be inserted anywhere in the documents to excludebL different sections from being indexed. However, existing tags can also beN used; this is especially useful to exclude some sections from being indexedI where the files to be indexed can not be edited. The example shows how E SCRIPT sections in 'uneditable' documents can be skipped; note how L noindex_start does not contain an ending >: this allows for all SCRIPTJ tags to be matched regardless of attributes defined (different types orG languages). Note that the match for this string is case insensitive.

 example:

noindex_start: <SCRIPT
 noindex_end: </SCRIPT>d
e



r
a
& no_next_page_text

"

 type:<
/
stringm
e
 used by:
<
5 htsearchb

 default:
/
[next]>
<
 description:
"
5 The text displayed where there would normally be a/ hyperlink to go to the next page of matches.e
s
 example:
e
 no_next_page_text:r
h
r



d
d( " no_page_list_header



 type:

string

 used by:
n
5 htsearche
w
 default:
x
 <empty>
t
 description:
t
8 This text will be used as the value of the PAGEHEADER( variable, for use in templates or the= search_results_footerm6 file, when all search results fit on a single page.
d
 example:
<
> m
p no_page_list_header:B <hr noshade size=2>All results on this page.<br>

o


y

e( " no_page_number_text

d

 type:
d
 quoted string list<
/
 used by:
n
5 htsearchg
d
 default:
n
 <empty>d
e
 description:
p
: The text strings in this list will be used when putting: together the PAGELIST variable, for use in templates orA the search_results_footer>@ file, when search results fit on more than page. The PAGELISTA is the list of links at the bottom of the search results page."; There should be as many strings in the list as there are=B pages allowed by the maximum_pages< attribute. If there are not enough, or the list is empty,A the page numbers alone will be used as the text for the links.f? An entry from this list is used for the current page, as theiC current page is shown in the page list without a hypertext link,o6 while entries from the C page_number_text list are used for the links to other pages.aC The text strings can contain HTML tags to highlight page numbers A or embed images. The strings need to be quoted if they containe spaces.

 example:

 c i  i
n no_page_number_text: I <strong>1</strong> <strong>2</strong> \
I <strong>3</strong> <strong>4</strong> \
>I <strong>5</strong> <strong>6</strong> \
>I <strong>7</strong> <strong>8</strong> \
D <strong>9</strong> <strong>10</strong>
i




i
o
l& no_prev_page_text

n

 type:s
t
stringr
v
 used by:
s
5 htsearch
f
 default:

[prev]p

 description:
:
5 The text displayed where there would normally be a 3 hyperlink to go to the previous page of matches.p
>
 example:

 no_prev_page_text:t
>
n



u
e' e! nothing_found_filea



 type:

string=

 used by:
d
5 htsearchp
h
 default:
<
 ${common_dir}/nomatch.html

 description:
r
2 This specifies the file which contains the 8 HTML text to display when no matches were found.3 The file should contain a complete HTML document.
5 Note that this attribute could also be defined in9 terms of database_base to/3 make is specific to the current search database.

 example:
<
m p
5 nothing_found_file: /www/searching/nothing.html
I
r


i

"  no_title_text

l

 type:/
d
stringe
t
 used by:

5 htsearch
a
 default:
<
filename

 description:

; This specifies the text to use in search results when no/9 title is found in the document itself. If it is set tos; filename, htsearch will use the name of the file itself,/, enclosed in brackets (e.g. [index.html]).
r
 example:
t
" no_title_text: "No Title Found"

s


w
d
i( nph



 type:n
t
boolean
d
 used by:
>
5 htsearchm
h
 default:
>
 false
n
 description:
i
3 This attribute determines whether htsearch sends/ out full HTTP headers as required for an NPH>4 (non-parsed header) CGI. Some servers assume CGIs3 will act in this fashion, for example MS IIS. Ifi3 your server does not send out full HTTP headers,  you should set this to true.i
&
 example:

nph: true
>
t



s
c% p page_list_header 



 type:i
n
string
m
 used by:
m
5 htsearch
l
 default:
g
, <hr noshade size=2>Pages:<br>
x
 description:
a
8 This text will be used as the value of the PAGEHEADER( variable, for use in templates or the= search_results_footerd; file, when all search results fit on more than one page.t
i
 example:

 page_list_header:

x



x
;* $ page_number_separator

_

 type:<
>
 quoted string list
m
 used by:
n
5 htsearchd
r
 default:
a
 " "
<
 description:
t
( The text strings in this list will be* used when putting together the PAGELIST( variable, for use in templates or the$ . search_results_footer file, when search. results fit on more than page. The PAGELIST, is the list of links at the bottom of the* search results page. The strings in the* list will be used in rotation, and will) separate individual entries taken from>7 page_number_text and : no_page_number_text.0 There can be as many or as few strings in the1 list as you like. If there are not enough for>. the number of pages listed, it goes back to/ the start of the list. If the list is empty,>0 a space is used. The text strings can contain3 HTML tags. The strings need to be quoted if theyi1 contain spaces, or to specify an empty string.t
r
 example:
h
2 page_number_separator: "</td> <td>"
>
t



r
t%  page_number_textl

g

 type:

 quoted string list
r
 used by:
_
5 htsearch

 default:
t
 <empty>

 description:

: The text strings in this list will be used when putting: together the PAGELIST variable, for use in templates orA the search_results_footer<@ file, when search results fit on more than page. The PAGELISTA is the list of links at the bottom of the search results page. ; There should be as many strings in the list as there arerB pages allowed by the maximum_pages< attribute. If there are not enough, or the list is empty,A the page numbers alone will be used as the text for the links.i@ Entries from this list are used for the links to other pages,: while an entry from the D no_page_number_text list is used for the current page, as theC current page is shown in the page list without a hypertext link.tC The text strings can contain HTML tags to highlight page numbersyA or embed images. The strings need to be quoted if they containt spaces.
<
 example:
g
 t g  g
b page_number_text: 9 <em>1</em> <em>2</em> \
e9 <em>3</em> <em>4</em> \
9 <em>5</em> <em>6</em> \
x9 <em>7</em> <em>8</em> \
/4 <em>9</em> <em>10</em>
s
l
&


o
&
g t pdf_parserg

&

 type:
t
stringg
n
 used by:
t
! htdigd

 default:
=
' path/acroread -toPostScriptn
>
 description:
s
= Set this to the path of the program used to parse PDFe6 files, including all command-line options. The7 program will be called with the parameters:
f$ infile outfile,
4 where infile is a file to parse and8 outfile is the PostScript output of the parser.

/ The program is supposed to convert to as3 variant of PostScript, which is then parsed/. internally. Currently, only Adobe's = acroread program has been tested as a pdf_parser.a; The default value of path is determined at 9 compile time, to include the path to the acroreadm3 executable. This defaults to /usr/local/bine< if the configuration program can't find acroread.

7 To successfully index PDF files, be sure to set}4 the max_doc_size9 attribute to a value larger than the size of yourw9 largest PDF file. PDF documents can not be parsedo! if they are truncated.

h4 Note: There is a bug in Acrobat 4's acroread7 command, which causes it to fail when -pairs is8 used. Ht://Dig version 3.1.3 and later include a8 work-around for this bug such that when acroread: is the parser, and the -pairs option is not given,9 the second parameter will be the output directory_, rather than the output file name.

2 The pdftops program that is part of the xpdf0 package is not suitable as a pdf_parser,5 because its variant of PostScript is slightlyp1 different. However, an alternative is tod3 use xpdf's pdftotext program as a componentd2 of an external7 parser with the xpdf 0.90 package installed/7 on your system, as described in FAQ question 4.9.>

p
 example:
h
D pdf_parser: /usr/local/Acrobat3/bin/acroread -toPostScript -pairs
i
e


/
c
b+ % prefix_match_character

l

 type:

stringw

 used by:
r
5 htsearch

 default:

 *
t
 description:
=
? A null prefix character means that prefix matching should be>5 applied to every search word. Otherwise a match isrF returned only if the word does not end in the characters specified.
e
 example:
f
 prefix_match_character: ing
a
s


h
r
M# e prev_page_texty

t

 type:m
e
stringd

 used by:
>
5 htsearchp
s
 default:

[prev]<
m
 description:
m
3 The text displayed in the hyperlink to go to ther previous page of matches.
<
 example:
<
 t 
l: prev_page_text: <img src="/htdig/buttonl.gif">
t
o


a
r
e$  remove_bad_urls

w

 type:a
t
boolean
a
 used by:
i
% htmerge>
<
 default:
b
 truea
e
 description:
>
: If TRUE, htmerge will remove any URLs which were marked: as unreachable by htdig from the database. If FALSE, it7 will not do this. When htdig is run in initial mode,s4 documents which were referred to but could not be6 accessed should probably be removed, and hence this: option should then be set to TRUE, however, if htdig is: run to update the database, this may cause documents on2 a server which is temporarily unavailable to be6 removed. This is probably NOT what was intended, so9 hence this option should be set to FALSE in that case.
A
 example:
o
 remove_bad_urls: true
n
l


i

sF remove_default_doc



 type:t
u
 string list
n
 used by:
l
! htdigm
s
 default:
h
index.htmlp
s
 description:
H
@ Set this to the default documents in a directory used by theC servers you are indexing. These document names will be strippedG off of URLs when they are normalized, if one of these names appears1 after the final slash, to translate URLs likel6 http://foo.com/index.html into http://foo.com/
= Note that you can disable stripping of these names during 9 normalization by setting the list to an empty string.tA The list should only contain names that all servers you indexA recognize as default documents for directory URLs, as defined D by the DirectoryIndex setting in Apache's srm.conf, for example.
g
 example:
r
> r
iG remove_default_doc: default.html default.htm index.html index.htma
or
r remove_default_doc:e
a
,
r


a

I# t robotstxt_name 

e

 type:
g
stringh
u
 used by:
f
! htdig

 default:
t
 htdig
s
 description:

6 Sets the name that htdig will look for when parsing: robots.txt files. This can be used to make htdig appear1 as a different spider than ht://Dig. Useful to 4 distinguish between a private and a global index.
h
 example:

 robotstxt_name: myhtdig




<
p
 script_name



 type:
r
string;
;
 used by:
m
5 htsearch&
&
 default:
g
 <empty>

 description:
;
) Overrides the value of the SCRIPT_NAME + environment attribute. This is useful if<1 htsearch is not being called directly as a CGIr0 program, but indirectly from within a dynamic0 .shtml page using SSI directives. Previously,. you needed a wrapper script to do this, but- this configuration attribute makes wrapperl, scripts obsolete for SSI and possibly for' other server scripting languages, asp. well. (You still need a wrapper script when using PHP, though.)
, Check out the contrib/scriptname0 directory for a small example. Note that this- attribute also affects the value of the CGI variablee used in htsearch templates.
b
 example:

<
c( script_name: /search/results.shtml
n

a


h
n
% b search_algorithma

a

 type:e
r
 string list

 used by:
i
5 htsearche

 default:
o
exact:1
a
 description:

: Specifies the search algorithms and their weight to use9 when searching. Each entry in the list consists of thel8 algorithm name, followed by a colon (:) followed by a8 weight multiplier. The multiplier is a floating point7 number between 0 and 1. Current algorithms supported' are:d
i exactt
3 The default exact word matching algorithm. Thisu) will find only exactly matched words.
soundexr
i7 Uses a slightly modified soundex algorithm to matche5 words. This requires that the soundex database be % present. It is generated with the / htfuzzy program.f
/ metaphonek
a4 Uses the metaphone algorithm for matching words.2 This algorithm is more specific to the english6 language than soundex. It is generated with the htfuzzy program.
endings>
k6 This algorithm uses language specific word endings6 to find matches. Each word is first reduced to its7 word root and then all known legal endings are used7 for the matching. This algorithm uses two databases 4 which are generated with  htfuzzy.
n synonyms
7 Performs a dictionary lookup on all the words. Thisl3 algorithm uses a database generated with the htfuzzy program.

substring

e0 Matches all words containing the queries as; substrings. Since this requires checking every word in5 the database, this can really slow down searchest considerably.
x
r prefixr

d/ Matches all words beginning with the querya strings. Uses the option href="#prefix_match_character">prefix_match_character. to decide whether a query requires prefix6 matching. For example "abc*" would perform prefix- matching on "abc" since * is the default  prefix_match_character.

o
 example:


d+ search_algorithm: exact:1 soundex:0.3
e
t
o


d

<* $ search_results_footer



 type:a
h
stringr
u
 used by:

5 htsearch

 default:
;
 ${common_dir}/footer.html
<
 description:
d
7 This specifies a filename to be output at the end of_4 search results. While outputting the footer, some5 variables will be expanded. Variables use the same: syntax as the Bourne shell. If there is a variable VAR,( the following will all be recognized:
    l
  • a $VAR
  • d $(VAR)
  • ${VAR}
  • 
) The following variables are available:>
R
MATCHESn
d. The number of documents that were matched.
o PLURAL_MATCHES
e5 If MATCHES is not 1, this will be the string "s",e7 else it is an empty string. This can be used to say something like "$(MATCHES)) document$(PLURAL_MATCHES) were found"e
MAX_STARS
s7 The value of the max_starsb attribute.
t LOGICAL_WORDSe
a5 A string of the search words with either "and" or/4 "or" between the words, depending on the type of search.<
< WORDSt
r/ A string of the search words with spaces in> between.
> PAGEHEADER
d+ This expands to either the value of the<7 page_list_header or.: no_page_list_header4 attribute depending on how many pages there are.

: Note that this file will NOT be output- if no matches were found. In this case thev7 nothing_found_filee attribute is used instead.r. Also, this file will not be output if it is overridden by defining then? search_results_wrappero attribute.e
a
 example:

l
t= search_results_footer: /usr/local/etc/ht/end-stuff.html
d
r
y


y
r
e* $ search_results_header



 type:
i
stringt
l
 used by:

5 htsearch

 default:
>
 ${common_dir}/header.html
m
 description:
o
9 This specifies a filename to be output at the start of/4 search results. While outputting the header, some5 variables will be expanded. Variables use the samei: syntax as the Bourne shell. If there is a variable VAR,( the following will all be recognized:
  • m $VAR
  • $(VAR)
  • ${VAR}
  • 
) The following variables are available:i
f MATCHESt
. The number of documents that were matched.
 PLURAL_MATCHES
5 If MATCHES is not 1, this will be the string "s",7 else it is an empty string. This can be used to sayn something like "$(MATCHES)) document$(PLURAL_MATCHES) were found"
/ MAX_STARSr
7 The value of the max_stars  attribute.
e LOGICAL_WORDSa
>5 A string of the search words with either "and" ort4 "or" between the words, depending on the type of search.
O WORDSe
M/ A string of the search words with spaces in between.

: Note that this file will NOT be output- if no matches were found. In this case thei7 nothing_found_file  attribute is used instead.a. Also, this file will not be output if it is overridden by defining thei? search_results_wrapperh attribute.P
b
 example:
m
e h
? search_results_header: /usr/local/etc/ht/start-stuff.htmls
e
n





a+ % search_results_wrapperi

s

 type:

string
<
 used by:
a
5 htsearch<
d
 default:

 <empty>

 description:
d
: This specifies a filename to be output at the start and0 end of search results. This file replaces theA search_results_header and = search_results_footerr= files, with the contents of both in one file, and uses thes< pseudo-variable $(HTSEARCH_RESULTS) as a0 separator for the header and footer sections.< If the filename is not specified, the file is unreadable,> or the pseudo-variable above is not found, htsearch reverts3 to the separate header and footer files instead.l While outputting the wrapper,3 some variables will be expanded, just as for the A search_results_header andd= search_results_footere files.
: Note that this file will NOT be output- if no matches were found. In this case the<7 nothing_found_filee attribute is used instead.n

 example:
g
I
t8 search_results_wrapper: ${common_dir}/wrapper.html




a
s
o> server_aliases

n

 type:e

 string list

 used by:

! htdig<
<
 default:

 <empty>l
.
 description:

= This directive tells the indexer that servers have several ? DNS aliases, which all point to the same machine and are NOTc= virtual hosts. This allows you to ensure pages are indexedcA only once on a given machine, despite the alias used in a URL.n

 example:

 l t n
h server_aliases: =3 foo.mydomain.com:80=www.mydomain.com:80 \
- bar.mydomain.com:80=www.mydomain.com:80
l
r



*

@ server_max_docs

e

 type:l
r
integer
h
 used by:

! htdigl
>
 default:
n
 -1 (no limit)
l
 description:
d
D This directive tells htdig to limit the dig to retrieve a maximum7 number of documents from each server. This can cause<9 unusual behavior on update digs since the old URLs ared9 stored alphabetically. Therefore, update digs will add : additional URLs in pseudo-alphabetical order, up to the8 limit of the directive. However, it is most useful to5 partially index a server as the URLs of additional.; documents are entered into the database, marked as nevern retrieved.u

 example:
s
 server_max_docs: 50
e
e



R
lB server_wait_time



 type:e
R
integer
<
 used by:
t
! htdig
M
 default:
H
 0
h
 description:

: This directive tells htdig to ensure a server has had a4 delay (in seconds) from the beginning of the last9 connection. This can be used to prevent "server abuse"t9 by digging without delay. It's recommended to set this : to 10-30 (seconds) when indexing servers that you don't: monitor yourself. Additionally, this directive can slow; down local indexing if set, which may or may not be whatr you intended.

 example:
t
 server_wait_time: 20s
e




D
<* sort



 type:l
/
string#
t
 used by:
b
5 htsearch
t
 default:
p
 score
.
 description:
f
3 This is the default sorting method that htsearch > uses to determine the order in which matches are displayed. The valid choices are:s s t f _
  • score
  • e time
  • title
r
  • f revscore
  • e revtime
  • d revtitler
n
h9 This attribute will only be used if the HTML form that 7 calls htsearch didn't have the sortn< value set. The words date and revdate can be used instead: of time and revtime, as both will sort by the time that9 the document was last modified, if this information is< given by the server. The default is to sort by the score,= which ranks documents by best match. The sort methods thate3 begin with "rev" simply reverse the order of the 7 sort. Note that setting this to something other thanw- "score" will incur a slowdown in searches.n
B
 example:

 sort: revtime






6 sort_names

T

 type:i

 quoted string listd
d
 used by:
a
5 htsearch

 default:
i
m score Score time Time title Title revscore 'Reverse Score' revtime 'Reverse Time' revtitle 'Reverse Title'M
e
 description:
r
/ These values are used to create the x6 sort menu. It consists of pairs. The first= element of each pair is one of the known sort methods, theg7 second element is the text that will be shown in thee= menu for that sort method. This text needs to be quoted if> it contains spaces.. See the select1 list documentation for more information on> how this attribute is used.
r
 example:
s
 n e c
i sort_names: a4 score 'Best Match' time Newest title A-Z \
8 revscore 'Worst Match' revtime Oldest revtitle Z-A
P
d


p
>
t6 soundex_db

s

 type:f

string

 used by:
r
) htfuzzy andt5 htsearch
<
 default:

 ${database_base}.soundex.db
>
 description:

8 The database file used for the fuzzy "soundex" search) algorithm. This database is created by 1 htfuzzy and used by 6 htsearch.
f
 example:
d
& soundex_db: ${database_base}.snd.db
f
_


s
a
n6 star_blank

s

 type:l

stringb
$
 used by:
t
5 htsearch
t
 default:
r
% ${image_url_prefix}/star_blank.gif
e
 description:
t
: This specifies the URL to use to display a blank of the' same size as the star defined in the>; star_image attribute or in theo8 star_patterns attribute.
i
 example:
i
c e
r= star_blank: http://www.somewhere.org/icons/elephant.gifi
>
a
>



>
r6 star_image

t

 type:e
d
string>
<
 used by:
s
5 htsearch
d
 default:
>
 ${image_url_prefix}/star.gif<
<
 description:
t
8 This specifies the URL to use to display a star. This7 allows you to use some other icon instead of a star.  (We like the star...)
9 The display of stars can be turned on or off with the 8 use_star_image8 attribute and the maximum number of stars that can be! displayed is determined by thee= max_stars attribute.
n7 Even though the image can be changed, the ALT value & for the image will always be a '*'.
l
 example:
s
= n
a= star_image: http://www.somewhere.org/icons/elephant.gif=
<



<
<
r< star_patterns

r

 type:
m
 string list

 used by:

5 htsearchl
>
 default:
n
 <empty>
l
 description:
d
5 This attribute allows the star image to be changedr9 depending on the URL or the match it is used for. Thisi9 is mainly to make a visual distinction between matcheso2 on different web sites. The star image could be9 replaced with the logo of the company the match refersi to.
8 It is advisable to keep all the images the same size8 in order to line things up properly in a short result listing.
n: The format is simple. It is a list of pairs. The first8 element of each pair is a pattern, the second element* is a URL to the image for that pattern.
/
 example:
<
 a r  /
n star_patterns: p) http://www.sdsu.edu /sdsu.gif \
# http://www.ucsd.edu /ucsd.gif<
f
l
a



d
e# / start_ellipsesi



 type:t
r
string
n
 used by:

5 htsearchi
u
 default:
t
/ <b><tt>... </tt></b>m
s
 description:
d
9 When excerpts are displayed in the search output, thisn6 string will be prepended to the excerpt if there is8 text before the text displayed. This is just a visual8 reminder to the user that the excerpt is only part of the complete document.s
<
 example:
p
 start_ellipses: ...




<

$  start_highlight



 type:d
e
string>
s
 used by:
i
5 htsearchi
r
 default:
l
 <strong>o

 description:

, When excerpts are displayed in the search2 output, matched words will be highlighted using, this string and , end_highlight. You should ensure that/ highlighting tags are balanced, that is, anyr0 formatting tags that this string opens should be closed by end_highlight.
i
 example:
c
0 start_highlight: <font color="#FF0000">

e


i

e4 start_url

t

 type:
h
 string list
h
 used by:
m
! htdigr
r
 default:
t
 http://www.htdig.org/
r
 description:
d
8 This is the list of URLs that will be used to start a5 dig when there was no existing database. Note that# multiple URLs can be given here.s
s
 example:

m >
< start_url: http://www.somewhere.org/alldata/index.html
t
m


<

( " substring_max_words

r

 type:m

integer

 used by:

5 htsearchr
m
 default:
e
 25
h
 description:
s
: The Substring fuzzy algorithm could potentially match a4 very large number of words. This value limits the9 number of words each substring pattern can match. Noteo8 that this does not limit the number of documents that are matched in any way.

 example:

 substring_max_words: 100t
o



e
t
a' t! synonym_dictionary 

s

 type:>
e
stringl
>
 used by:
n
% htfuzzy
<
 default:

 ${common_dir}/synonyms

 description:
>
4 This points to a text file containing the synonym9 dictionary used for the synonyms search algorithm.
6 Each line of this file has at least two words. The5 first word is the word to replace, the rest of thep$ words are synonyms for that word.

 example:

t e
f, synonym_dictionary: /usr/dict/synonyms
h
.
d


p
>
t6 synonym_db



 type:<
r
stringn
n
 used by:

9 htsearch and/% htfuzzy

 default:
_
 ${common_dir}/synonyms.db
f
 description:
l
6 Points to the database that 9 htfuzzy creates when the synonymsu algorithm is used.
e6 htsearch3 uses this to perform synonym dictionary lookups.<
a
 example:

& synonym_db: ${database_base}.syn.db
>
l


o

r& syntax_error_file



 type:
l
stringn
s
 used by:
>
5 htsearch
r
 default:
<
 ${common_dir}/syntax.html
m
 description:

7 This points to the file which will be displayed if ae- boolean expression syntax error was found.s
e
 example:

 r
o4 syntax_error_file: ${common_dir}/synerror.html
n
d


<
r
s: template_map

s

 type:n

 string list
<
 used by:
h
5 htsearcha
.
 default:
>
5 Long builtin-long builtin-long Short builtin-short builtin-short
s
 description:

7 This maps match template names to internal names andr5 template file names. It is a list of triplets. The>9 first element in each triplet is the name that will be : displayed in the FORMAT menu. The second element is the2 name used internally and the third element is a' filename of the template to use.
m7 There are two predefined templates, namely <% builtin-long and 4 builtin-short. If the filename is one of( those, they will be used instead.
8 More information about templates can be found in the5 htsearchi documentation.
a
 example:
r
 l p t t
t template_map:r 0 Short short ${common_dir}/short.html \
& Normal normal builtin-long \
/ Detailed detail ${common_dir}/detail.htmli
f
e
a



e
< template_name

e

 type:o

stringa
n
 used by:

5 htsearch/
d
 default:
t
 builtin-longa

 description:
n
9 Specifies the default template if none is given by the<( search form. This needs to map to the, template_map.

 example:

 template_name: long
a



<
u
D template_patterns

d

 type:t
>
 string list
h
 used by:
r
5 htsearcha
s
 default:
t
 <empty>c
u
 description:
e
; This attribute allows the results template to be changed9 depending on the URL or the match it is used for. Thism9 is mainly to make a visual distinction between matchesd: on different web sites. The results for each site could3 thus be shown in a style matching that site.
m3 The format is simply a list of pairs. The firste8 element of each pair is a pattern, the second element9 is the name of the template file for that pattern.
n8 More information about templates can be found in the5 htsearchh documentation.
t/ Normally, when using this template selectioni+ method, you would disable user selectionl+ of templates via the format inputt0 parameter in search forms, as the two methods2 were not really designed to interact. Templates. selected by URL patterns would override any2 user selection made in the form. If you want to1 use the two methods together, see the notes onl? combining=) them for an example of how to do this.>

 example:
/
  f
< template_patterns: i7 http://www.sdsu.edu ${common_dir}/sdsu.html \
/1 http://www.ucsd.edu ${common_dir}/ucsd.htmlr




L
w
e8 text_factor

i

 type:d
>
number>

 used by:

! htdigr
t
 default:
t
 1
m
 description:

6 This is a factor which will be used to multiply the8 weight of words that are not in any special part of a: document. Setting a factor to 0 will cause normal words4 to be ignored. The number may be a floating point2 number. See also the 5 heading_factor_[1-6], 4 title_factor, and " keywords_factor attributes.
i
 example:
a
 text_factor: 0d
u
t


c
r
r0 timeout

t

 type:

numbere

 used by:
0
! htdig<
t
 default:
i
 30y
y
 description:

8 Specifies the time the digger will wait to complete a1 network read. This is just a safeguard against , unforeseen things like the all too common2 transformation from a network to a notwork.
( The timeout is specified in seconds.

 example:
t
 timeout: 42
t
t


c

t: title_factor



 type:

numberw
n
 used by:

! htdig

 default:
i
 100

 description:
.
6 This is a factor which will be used to multiply the8 weight of words in the title of a document. Setting a2 factor to 0 will cause words in the title to be: ignored. The number may be a floating point number. See& also the & heading_factor_[1-6] attribute.
/
 example:

 title_factor: 12u




}
m
< translate_amp

e

 type:>
y
boolean
o
 used by:

! htdige

 default:
o
 false

 description:
n
C If set to false, the entity &amp; (or &#38;) will not beoI translated into its ASCII equivalent &. If translation were takingfN place, an excerpt containing a & might be misinterpreted by the browserM and look unrecognizable to the user. For this reason, not translating this/" entity is the default behavior.
e
 example:
>
 translate_amp: true

o


/

@ translate_lt_gt

e

 type:o
r
boolean
d
 used by:
d
! htdig
x
 default:

 false
e
 description:
r
E If set to false, the entities &lt; (or &#60;) and &gt;N (or &#62;) will not be translated into their ASCII equivalents < andI >. If translation were taking place, an excerpt containing < and M > might be misinterpreted by the browser and look unrecognizable to thedG user. For this reason, not translating these entities is the defaultt behavior.
r
 example:

 translate_lt_gt: true

a


o
a
n> translate_quot

h

 type:
i
boolean
m
 used by:

! htdigt
r
 default:

 false
n
 description:
o
D If set to false, the entity &quot; (or &#34;) will not beJ translated into its ASCII equivalent ". If translation were takingO place, an excerpt containing a " might be misinterpreted by the browsereM and look unrecognizable to the user. For this reason, not translating this " entity is the default behavior.
t
 example:
m
 translate_quot: truem
u
g


t
o
/* $ uncoded_db_compatible

<

 type:m
>
boolean
d
 used by:
:
" htdig,( htnotify,, htmerge and htsearch
u
 default:
l
 true
<
 description:
S
6 At the cost of time for extra database accesses and- not getting the full effect of the optionsm7 common_url_parts andm4 url_part_aliases,0 read databases where some or all URLs are not3 encoded at all through these options.
a3 Note that the database still needs to be rebuilt<, if either or both of common_url_parts and7 url_part_aliases were non-empty when it was built or6 modified, or if they were set to anything else than the current values.
2 If a to-string in url_part_aliases can5 occur in normal URLs, this option should be set toc+ false to eliminate surprises.
e
t
 example:
r
" uncoded_db_compatible: false

r
h



n
e2 url_list

r

 type:e

stringi

 used by:

! htdig
e
 default:

 ${database_base}.urls
o
 description:
t
 This file is only created if D create_url_list is set to5 true. It will contain a list of all URLs that weree seen.
r
 example:
s
 url_list: /tmp/urls
g
t


c
U
n0 url_log



 type:s
s
string_
t
 used by:
e
! htdigd
>
 default:
>
 ${database_base}.log
a
 description:
d
> If htdig is run with the -l option: and interrupted, it will write out its progress to this= file. Note that if it has a large number of URLs to write,d< it may take some time to exit. This can especially happen; when running update digs and the run is interrupted soon after beginning.:

 example:

 url_log: /tmp/htdig.progress
<
t




% d url_part_aliases 

e

 type:a
c
 string list
i
 used by:
r
" htdig,( htnotify,, htmerge and htsearch
<
 default:
c
 <empty>r
o
 description:
>
1 A list of translations pairs from and 1 to, used when accessing the database.' If a part of an URL matches with theo0 from-string of each pair, it will be5 translated into the to-string just before2 writing the URL to the database, and translated4 back just after reading it from the database.
3 This is primarily used to provide an easy way to ) rename parts of URLs for e.g. changing 0 www.example.com/~htdig to www.htdig.org. Two0 different configuration files for digging and1 searching are then used, with url_part_aliasesr. having different from strings, but% identical to-strings.
o See also common_url_parts.
d1 Strings that are normally incorrect in URLs or & very seldom used, should be used as3 to-strings, since extra storage will ber2 used each time one is found as normal part of a5 URL. Translations will be performed with prioritym( for the leftmost longest match. Each1 to-string must be unique and not be a, part of any other to-string.
0 Note that when this attribute is changed, the3 database should be rebuilt, unless the effect ofd0 "moving" the affected URLs in the database is wanted, as described above.
e
 example:
m
 l d g t r p
 url_part_aliases:b e3 http://search.example.com/~htdig *site \
f* http://www.htdig.org/this/ *1 \
 .html *2p
/ url_part_aliases: t( http://www.htdig.org/ *site \
* http://www.htdig.org/that/ *1 \
 .htm *2
o



u
<
) t# use_meta_descriptionm



 type:

boolean
t
 used by:
y
5 htsearchC
n
 default:
a
 false
p
 description:
d
< If set to true, any META description tags will be used as< excerpts by htsearch. Any documents that do not have META2 descriptions will retain their normal excerpts.
e
 example:
<
 use_meta_description: true=
l
n



>
e# o use_star_imagee



 type:d

boolean

 used by:
>
5 htsearchs
e
 default:
e
 true;
p
 description:
t
1 If set to true, the .8 star_image attribute is used to display upto9 max_stars images ford each match.
d
 example:

 use_star_image: no

r


e

 r user_agent 

r

 type:m
e
string<
>
 used by:
<
! htdige
d
 default:
t
 htdig
.
 description:
d
? This allows customization of the user_agent: field sent whend, the digger requests a file from a server.
e
 example:

 user_agent: htdig-diggeru
o
a


p
n
c% t valid_extensionsn



 type:g
e
 string list
/
 used by:
/
! htdig
/
 default:
<
 <empty>o

 description:

1 This is a list of extensions on URLs which are < the only ones considered acceptable. This list is used to: supplement the MIME-types that the HTTP server provides: with documents. Some HTTP servers do not have a correct2 list of MIME-types and so can advertise certain7 documents as text while they are some binary format.d< If the list is empty, then all extensions are acceptable,A provided they pass other criteria for acceptance or rejection.t; If the list is not empty, only documents with one of thef% extensions in the list are parsed."9 See also bad_extensions.u
a
 example:
e
& valid_extensions: .html .htm .shtml
r
e


h
d
t& valid_punctuation

e

 type:
o
string
s
 used by:

% htdig andc5 htsearch>
o
 default:

 .-_/!#$%^&'

 description:

6 This is the set of characters which will be deleted7 from the document before determining what a word is.8 This means that if a document contains something like5 Andrew's the digger will see this as e Andrews.
8 The same transformation is performed on the keywords the search engine gets.
 See also the extra_word_characters attribute.>
t
 example:

 valid_punctuation: -'
l
e



n
s0 version

p

 type:l
/
string/
d
 used by:
r
5 htsearch
d
 default:
>
VERSION
>
 description:

* This specifies the value of the VERSION2 variable which can be used in search templates.4 The default value of this attribute is determined0 at compile time, and will not normally be set in configuration files.
-
 example:

 version: 3.1.2PL1
t
s


o
d
a0 word_db



 type:d
a
string

 used by:

: htdig, 9 htmerge and p htsearch,
i
 default:
d
 ${database_base}.words.db
d
 description:
m
8 This is the main word database. It is an index of all4 the words to a list of documents that contain the6 words. This database can grow large pretty quickly.
g
 example:
>
( word_db: ${database_base}.allwords.db
g



<
r
e4 word_list

n

 type:n
a
string
h
 used by:
c
% htdig and<% htmergee

 default:
r
 ${database_base}.wordlist
a
 description:

6 This is the input file that 5 htmerge uses to create the main words databased8 specified by word_db.2 This file gets about as large as the main words: database. If this file exists when htdig is running, it7 will append data to this file. htmerge will then useS6 the existing data and the appended data to create a% completely new main word database.t
n
 example:
a
r m
T/ word_list: ${database_base}.allwords.textf
e



I Andrew Scherpbier <andrew@contigo.com>d
d+Last modified: $Date: 2000/02/24 16:42:52 $n  ˙˙ove. e
 example:
m
 l d g
 url_part_aliases:b e3 http://search.example.com/~htdig *site \
f* http://www.htdig.org/this/ *1 \