
How to index your web pages
with SWISH and WWWWAIS
SWISH and WWWWAIS
are a pair of tools for indexing and searching web pages
using World-Wide Web forms.
This document explains how to build your own web indexes using
these tools
on web servers maintained by Information Technology at Rice University.
The following servers are included:
es.rice.edu
is.rice.edu
riceinfo.rice.edu
www.owlnet.rice.edu
www.ruf.rice.edu
Important: SWISH and WWWWAIS do not support
access control
in the manner of NCSA httpd. Therefore it is important that you
only use SWISH and WWWWAIS to index material which is intended for
a world-wide audience, not material restricted to the Rice campus
or some other limited audience.
Note that this document covers only a subset of the options of
SWISH and WWWWAIS
and that there are other ways to index web documents, particularly
if you run your own web server. See the "more information"
section below if you want to dig deeper.
Sections of this page:
Index your web pages with SWISH
Here are the steps necessary to create a searchable index of web
documents using SWISH:
- Make sure that you know how to create documents in
HTML (HyperText Markup Language) and publish them on a WWW server.
If you don't know how, see the
WWW documentation
in RiceInfo
or contact the Consulting Center at x4983 or riceinfo@rice.edu.
- Create the set of HTML documents which you wish to search
and put them in a place where they are served out by a web server
on which the SWISH and WWWAIS software is installed
(e.g., under your ~/public_html directory on a machine
maintained by Information Technology).
It is helpful if your files are in a single subdirectory or set of
subdirectories and there is a simple rule for referring to them
using the syntax of the Unix shell, e.g.:
/home/jdoe/public_html/blahfiles/blah*.html
- Invoke the swish command to index your files.
While
SWISH
offers many options, you will
probably want to use the following:
/usr/site/swish/bin/swish \ (swish program)
-c /usr/site/swish/lib/swish.conf \ (swish configuration file)
-f /home/jdoe/public_html/blahfiles/blah.swish \ (output index file)
-i /home/jdoe/public_html/blahfiles/blah*.html \ (full path of the files to be indexed)
-v (verbose flag)
You may wish to
write a short shell script so you can conveniently run this command again
each time your index needs to be updated. Example:
#!/bin/csh -f
#
# rebuild index of "blah" data using swish
#
/usr/site/swish/bin/swish -c /usr/site/swish/lib/swish.conf \
-f /home/jdoe/public_html/blahfiles/blah.swish \
-i /home/jdoe/public_html/blahfiles/blah*.html -v
- Test your SWISH index.
You may invoke the swish command by hand to test the index file
you just built. For example, to search your index for the word "rice"
you would enter the following command:
/usr/site/swish/bin/swish -f /home/jdoe/public_html/blahfiles/blah.swish -w rice
You are now ready to create a web form to search your SWISH index.
Create a search form using WWWWAIS
Here are the steps necessary to search your SWISH index using WWWWAIS:
- The format of your search results.
Create an HTML document defining the appearance of the web
page that WWWWAIS will return with your search results.
Note that this file will deviate from strict HTML in two ways:
- The first line must consist of a special string to tell WWWWAIS that
it is okay to redistribute the file. (This is an extension
added at Rice in order to prevent misuse of the WWWWAIS gateway.)
The special string:
<!-- GatewayMayRedistribute -->
- The file is not a full HTML file but only contains information
which will appear at the beginning of your search results. The
</body>
and
</html>
tags should not be closed.
Example search results format file:
<!-- GatewayMayRedistribute -->
<html>
<head>
<title>Blah database: search results</title>
</head>
<body>
<h1>Blah database: search results</h1>
Here are the results of your search of the Blah database.
-
Your search form.
Create an HTML document containing your search form.
This will call for the use of the
<form> tag and several
<input> tags.
While
WWWWAIS
offers many options which may be set in web forms, you will probably
want to use the following:
- In the
<action> attribute of the
<form> tag, specify
the URL of the wwwwais CGI program on the web server you are using:
<form method=GET action="http://www.mydomain.rice.edu/cgi-bin/wwwwais">
- Specify an <input> tag for the user's search terms
and another for a search button:
Enter your search terms:
<input type=text name="keywords" size=20>
<input type=submit value="Search">
- In a hidden field with the name "sourcedir",
specify the directory where your SWISH index lives:
<input type=hidden name=sourcedir value="/home/jdoe/public_html/blahfiles">
- In a hidden field with the name "source",
specify the name of your SWISH index file:
<input type=hidden name=source value="blah.swish">
- In a hidden field with the name "pagetitle",
specify the full path of
the HTML document defining the format of your search results
(this is a local Rice extension to WWWWAIS):
<input type=hidden name=pagetitle value="/home/jdoe/public_html/blahfiles/results.html">
- In a hidden field with the name "maxhits", specify the maximum number of hits to be returned:
<input type=hidden name=maxhits value="40">
- In a hidden field with the name "sorttype",
specify the order by which hits should be sorted
(the most likely choices are "score" and "title"):
<input type=hidden name=sorttype value="title">
- In a hidden field with the name "searchprog",
specify the search program "swish":
<input type=hidden name=searchprog value="swish">
Here is an example <form> tag combining the above elements:
<form method=GET action="http://www.mydomain.rice.edu/cgi-bin/wwwwais">
Enter your search terms:
<input type=text name="keywords" size=20>
<input type=submit value="Search">
<input type=hidden name=sourcedir value="/home/jdoe/public_html/blahfiles">
<input type=hidden name=source value="blah.swish">
<input type=hidden name=pagetitle value="/home/jdoe/public_html/blahfiles/results.html">
<input type=hidden name=maxhits value="40">
<input type=hidden name=sorttype value="title">
<input type=hidden name=searchprog value="swish">
</form>
An example index
Here is an example of a searchable index built using SWISH and WWWWAIS:
For more information
For more information, see:
If you have questions, send e-mail to riceinfo@rice.edu or call or visit the Consulting Center, 527-4983, Mudd 103.
Back to Selected WWW Documentation
--
Prentiss Riddle
and the
RiceInfo support team
riceinfo@rice.edu)
1998.10.08