create_word_hash(1) Expaminator create_word_hash(1)
NAME
create_word_hash
SYNOPSIS
create_word_hash [-v] [-d] [-f] [-m] hashfile dir...
DESCRIPTION
create_word_hash creates a dictionary of words found in
all textfiles in the specified list of directories, and
assigns to each word the number of files (or messages; see
-m) in which that word was found. The dictionary is in
the form of a Berkeley DB hash, specified by 'hashfile'.
Currently, the following characters, in addition to the
ASCII 'space' and 'tab' characters, are used as delimiters
when parsing, and therefore are not considered to be parts
of words : @#%^&*()_+=|{}[]:;"'<>,?/
Command-line options:
-d write debugging messages (the name each directory &
file processed)
-f 'force'; if 'hashfile' already exists, delete &
replace it.
-m assume email files could be in sendmail (same as unix
'mail') 'mbox' format. (These may contain multiple
messages, each starting with a "^From " line)
-h Help; print the command-line options and exit.
-v be verbose; print a dot for every message file pro
cessed.
hashfile - pathname of words/frequency-of-occurence hash
to be created. If 'hashfile' is a bare file name,
then the environment variable '$SPAMDIR' will be
prepended. If 'hashfile' is a bare file name, and
'$SPAMDIR' is not set, the current working directory
is used.
dir... list of directories containing email files; each
directory is processed recursively.
ENVIRONMENT
$SPAMDIR, as discussed above.
FILES
Required: at least one directory containing text files.
Any text file within the directory will be processed.
SEE ALSO
make_new_database, create_probability_hash
COPYRIGHT
Copyright (c) 2002, J.B.Ward
<bward2@users.sourceforge.net>
Expaminator Nov.29,2002 create_word_hash(1)