j  � ht://Dig: Configuration �  

 Configuration



= ht://Dig Copyright © 1995-2000 The ht://Dig Group
8 Please see the file COPYING for license information.




@ ht://Dig requires a configuration file and several HTML files6 to operate correctly. Fortunately, when ht://Dig is? installed, a very reasonable configuration is created and in7 most cases only minor modifications to the files are necessary.



> Below, we will use the variables that were set in CONFIG to designate specific paths.



 Standard files:




5 ${CONFIG_DIR}/htdig.conf



? This is the main runtime configuration file for all programs< that make up ht://Dig. The file is fully described in the< Configuration file manual.



9 When ht://Dig is installed, several attributes will be@ customized to your particular environment, but for reference,2 here is a sample copy of what it can look like:


### Example config file for ht://Dig./# Last modified 2-Sep-1996 by Andrew Scherpbier#L# This configuration file is used by all the programs that make up ht://Dig.I# Please refer to the attribute reference manual for more details on whatC# can be put into this file.  (http://www.htdig.org/confindex.html)F# Note that most attributes have very reasonable default values so youM# really only have to add attributes here if you want to change the defaults.#J# What follows are some of the common attributes you might want to change.##G# Specify where the database files need to go.  Make sure that there isF# plenty of free disk space available for the databases.  They can get
# pretty big.#%database_dir:       /opt/www/htdig/db#M# This specifies the URL where the robot (htdig) will start.  You can specify=# multiple URLs here.  Just separate them by some whitespace.K# The example here will cause the ht://Dig homepage and related pages to be
# indexed.#%start_url:      http://www.htdig.org/#M# This attribute limits the scope of the indexing process.  The default is toM# set it to the same as the start_url above.  This way only pages that are onL# the sites specified in the start_url attribute will be indexed and it will1# reject any URLs that go outside of those sites.#I# Keep in mind that the value for this attribute is just a list of stringK# patterns. As long as URLs contain at least one of the patterns it will be)# seen as part of the scope of the index.# limit_urls_to:      ${start_url}#M# If there are particular pages that you definitely do NOT want to index, youN# can use the exclude_urls attribute.  The value is a list of string patterns.H# If a URL matches any of the patterns, it will NOT be indexed.  This isK# useful to exclude things like virtual web trees or database accesses.  ByN# default, all CGI URLs will be excluded.  (Note that the /cgi-bin/ conventionK# may not work on your web server.  Check the  path prefix used on your web
# server.)#"exclude_urls:       /cgi-bin/ .cgi#L# The excerpts that are displayed in long results rely on stored informationM# in the index databases.  The compiled default only stores 512 characters ofL# text from each document (this excludes any HTML markup...)  If you plan onM# using the excerpts you probably want to make this larger.  The only concernL# here is that more disk space is going to be needed to store the additionalL# information.  Since disk space is cheap (! :-)) you might want to set thisK# to a value so that a large percentage of the documents that you are goingI# to be indexing are stored completely in the database.  At SDSU we foundH# that by setting this value to about 50k the index would get 97% of allL# documents completely and only 3% was cut off at 50k.  You probably want to# experiment with this value.K# Note that if you want to set this value low, you probably want to set theO# excerpt_show_top attribute to false so that the top excerpt_length characters## of the document are always shown.#max_head_length:    10000#L# Depending on your needs, you might want to enable some of the fuzzy searchK# algorithms.  There are several to choose from and you can use them in anyJ# combination you feel comfortable with.  Each algorithm will get a weightN# assigned to it so that in combinations of algorithms, certain algorithms getK# preference over others.  Note that the weights only affect the ranking of(# the results, not the actual searching.# The available algorithms are:	#   exact#   endings#   synonyms#   soundex
#   metaphone># By default only the "exact" algorithm is used with weight 1.M# Note that if you are going to use any of the algorithms other than "exact",I# you need to use the htfuzzy program to generate the databases that each# algorithm requires.#4search_algorithm:   exact:1 synonyms:0.5 endings:0.1#?# The following are used to change the text for the page index.@# The defaults are just boring text numbers.  These images spiceF# up the result pages quite a bit.  (Feel free to do whatever, though)#hnext_page_text:     <img src=/htdig/buttonr.gif border=0 align=middle width=30 height=30 alt=next>no_next_page_text:hprev_page_text:     <img src=/htdig/buttonl.gif border=0 align=middle width=30 height=30 alt=prev>no_prev_page_text:ipage_number_text:   "<img src=/htdig/button1.gif border=0 align=middle width=30 height=30 alt=1>" \X			"<img src=/htdig/button2.gif border=0 align=middle width=30 height=30 alt=2>" \X			"<img src=/htdig/button3.gif border=0 align=middle width=30 height=30 alt=3>" \X			"<img src=/htdig/button4.gif border=0 align=middle width=30 height=30 alt=4>" \X			"<img src=/htdig/button5.gif border=0 align=middle width=30 height=30 alt=5>" \X			"<img src=/htdig/button6.gif border=0 align=middle width=30 height=30 alt=6>" \X			"<img src=/htdig/button7.gif border=0 align=middle width=30 height=30 alt=7>" \X			"<img src=/htdig/button8.gif border=0 align=middle width=30 height=30 alt=8>" \X			"<img src=/htdig/button9.gif border=0 align=middle width=30 height=30 alt=9>" \X			"<img src=/htdig/button10.gif border=0 align=middle width=30 height=30 alt=10>"#E# To make the current page stand out, we will put a border around the# image for that page.#mno_page_number_text:    "<img src=/htdig/button1.gif border=2 align=middle width=30 height=30 alt=1>" \X			"<img src=/htdig/button2.gif border=2 align=middle width=30 height=30 alt=2>" \X			"<img src=/htdig/button3.gif border=2 align=middle width=30 height=30 alt=3>" \X			"<img src=/htdig/button4.gif border=2 align=middle width=30 height=30 alt=4>" \X			"<img src=/htdig/button5.gif border=2 align=middle width=30 height=30 alt=5>" \X			"<img src=/htdig/button6.gif border=2 align=middle width=30 height=30 alt=6>" \X			"<img src=/htdig/button7.gif border=2 align=middle width=30 height=30 alt=7>" \X			"<img src=/htdig/button8.gif border=2 align=middle width=30 height=30 alt=8>" \X			"<img src=/htdig/button9.gif border=2 align=middle width=30 height=30 alt=9>" \X			"<img src=/htdig/button10.gif border=2 align=middle width=30 height=30 alt=10>"/



6 ${SEARCH_DIR}/search.html



A This is the default search form. It is an example interface to>@ the search engine, htsearch. The file contains a form with asB its action a call to htsearch. There are several form variables@ which htsearch will use. More about those can be found in the1 htsearchs documentation.l

u

% An example file can be as follows:y

a
f<html><head>.<title>ht://Dig WWW Search</title>
</head><body bgcolor="#eef7ff">
<h1>z<a href="http://www.htdig.org"><IMG SRC="@IMAGEDIR@/htdig.gif" align=bottom alt="ht://Dig" border=0></a>WWW Site Search</H1><hr noshade size=4>>4This search will allow you to search the contents of6all the publicly available WWW documents at this site.
<br>	<p>.5<form method="post" action="/cgi-bin/htsearch">$<font size=-1>!Match: <select name=method>e<option value=and>AllR<option value=or>Any</select>h"Format: <select name=format>%<option value=builtin-long>Longs'<option value=builtin-short>Short$</select>g!Sort by: <select name=sort>s<option value=score>Scorei<option value=time>Timea<option value=title>Title *<option value=revscore>Reverse Score(<option value=revtime>Reverse Time*<option value=revtitle>Reverse Title</select>l
</font>l1<input type=hidden name=config value=htdig> 0<input type=hidden name=restrict value="">/<input type=hidden name=exclude value="">"
<br>Search:o9<input type="text" size="30" name="words" value="">r*<input type="submit" value="Search">
</form>l<hr noshade size=4>h
</body>a
</html> e



6 ${COMMON_DIR}/header.html



@ This file is the file that is output before any of the search@ results are produced in a search. This file can be customizedB to reflect your particular web look-and-feel, for example. Take< note that this file is only the top part of the full HTML? document that is produced when search results are displayed. 7 This means that it should start with the proper HTMLh introductory tags and title.a

d

@ This file will not just simply be copied. Instead, the search@ engine will look for special variables inside the file. These= variables will be replaced with the appropriate values forl@ the particular search it is used for. For more details of the& use of these variables, consult theE htsearch templates documentation. 

a

= Below is the default header.html file that gets installed.l? Note that it contains a form to allow the user to refine thei search.

d
 ]<html><head><title>Search results for '$(WORDS)'</title></head> <body bgcolor="#eef7ff">,<h2><img src="/htdig/htdig.gif">0Search results for '$(LOGICAL_WORDS)'</h2><hr noshade size=4>_)<form method="get" action="$(CGI)">e<font size=-1>5<input type=hidden name=config value=$(CONFIG)>s;<input type=hidden name=restrict value="$(RESTRICT)">p9<input type=hidden name=exclude value="$(EXCLUDE)">OMatch: $(METHOD)Format: $(FORMAT)xSort by: $(SORT)
<br>Refine search:A<input type="text" size="30" name="words" value="$(WORDS)">t*<input type="submit" value="Search"></select>e
</font>p
</form> <hr noshade size=1>N<b>Documents $(FIRSTDISPLAYED) - $(LASTDISPLAYED) of $(MATCHES) matches.IMore <img src="/htdig/star.gif" alt="*">'s indicate a better match.m
</b><hr noshade size=1>se

t

l6 ${COMMON_DIR}/footer.html



= This file is output after all the search results have beenh@ displayed. All the same header.html rules apply to this file,B except that it is supposed to contain all the ending HTML tags.

s

= Below is the default footer.html file that gets installed.r3 Note that it contains the page navigation stuff.h

a
 <hr noshade>Pages:<br>#$(PREVPAGE) $(PAGELIST) $(NEXTPAGE)o<hr noshade size=4> %<a href="http://www.htdig.org">nA<img src=".htdig/htdig.gif" border=0>ht://Dig 3.0</a>t</body></html> 

w

t8 ${COMMON_DIR}/nomatch.html



? If a search produces no matches, this file is displayed. All @ the relevant variables will be replaced as in the header.htmlA and footer.html files. The default nomatch.html is little morea- than header.html and footer.html appended:o

w
a_<html><head><title>No match for '$(LOGICAL_WORDS)'</title></head>r<body bgcolor="#eef7ff">,<h1><img src="/htdig/htdig.gif">Search results</h1>e<hr noshade size=4>lA<h2>No matches were found for '$(LOGICAL_WORDS)'</h2>	<p>h2Check the spelling of the search word(s) you used.6If the spelling is correct and you only used one word,Itry using one or more similar search words with "<b>Any</b>."g</p><p>a5If the spelling is correct and you used more than oneaHword with "<b>Any</b>," try using one or more similar search7words with "<b>Any</b>."</p><p>s5If the spelling is correct and you used more than onepKword with "<b>All</b>", try using one or more of the same wordsp(with "<b>Any</b>."</p><hr noshade size=4>d)<form method="get" action="$(CGI)">t<font size=-1>5<input type=hidden name=config value=$(CONFIG)>e;<input type=hidden name=restrict value="$(RESTRICT)">m9<input type=hidden name=exclude value="$(EXCLUDE)">mMatch: $(METHOD)Format: $(FORMAT)Sort by: $(SORT)
<br>Refine search:A<input type="text" size="30" name="words" value="$(WORDS)">g*<input type="submit" value="Search"></select>t
</font>m
</form>t<hr noshade size=4>d%<a href="http://www.htdig.org">"A<img src="/htdig/htdig.gif" border=0>ht://Dig 3.0</a>t</body></html>d



i6 ${COMMON_DIR}/syntax.html



B If a boolean expression search causes a syntax error, this file will be displayed.f

=
=f<html><head><title>Error in Boolean search for '$(WORDS)'</title></head><body bgcolor="#eef7ff">,<h1><img src="/htdig/htdig.gif">9Error in Boolean search for '$(LOGICAL_WORDS)'</h1>t<hr noshade size=4>n@Boolean expressions need to be 'correct' in order for the searchsystem to use them..6The expression you entered has errors in it.<br><blockquote><b>t$(SYNTAXERROR)</b></blockquote>3<hr noshade size=4> )<form method="get" action="$(CGI)">e<font size=-1>5<input type=hidden name=config value=$(CONFIG)> ;<input type=hidden name=restrict value="$(RESTRICT)">=9<input type=hidden name=exclude value="$(EXCLUDE)"> Match: $(METHOD)Format: $(FORMAT)bSort by: $(SORT)
<br>Refine search:A<input type="text" size="30" name="words" value="$(WORDS)">g*<input type="submit" value="Search"></select>d
</font>b
</form>i<hr noshade size=4>=%<a href="http://www.htdig.org">9A<img src="/htdig/htdig.gif" border=0>ht://Dig 3.0</a>;</body></html>a

I Andrew Scherpbier <andrew@contigo.com>_
t+Last modified: $Date: 2000/02/17 22:05:21 $a ipÿÿ interface to>@ the search engine, htsearch. The file contains a form with asB its action a call to htsearch. There are several form variables@ which htsearch will use. More about those can be found in the1 htsearchs