create_word_hash(1)        Expaminator        create_word_hash(1)



NAME
       create_word_hash

SYNOPSIS
       create_word_hash  [-v] [-d] [-f] [-m]  hashfile  dir...

DESCRIPTION
       create_word_hash  creates  a  dictionary of words found in
       all textfiles in the specified list  of  directories,  and
       assigns to each word the number of files (or messages; see
       -m) in which that word was found.  The  dictionary  is  in
       the form of a Berkeley DB hash, specified by 'hashfile'.

       Currently,  the  following  characters, in addition to the
       ASCII 'space' and 'tab' characters, are used as delimiters
       when parsing, and therefore are not considered to be parts
       of words : @#%^&*()_+=|{}[]:;"'<>,?/

       Command-line options:

       -d  write debugging messages (the name  each  directory  &
           file processed)

       -f   'force';  if  'hashfile'  already  exists,  delete  &
           replace it.

       -m  assume email files could be in sendmail (same as  unix
           'mail')  'mbox'  format.   (These may contain multiple
           messages, each starting with a "^From " line)

       -h  Help; print the command-line options and exit.

       -v be verbose; print a dot for  every  message  file  pro­
           cessed.

       hashfile  -  pathname of words/frequency-of-occurence hash
           to be created.  If 'hashfile' is  a  bare  file  name,
           then  the  environment  variable  '$SPAMDIR'  will  be
           prepended.  If 'hashfile' is a  bare  file  name,  and
           '$SPAMDIR'  is  not set, the current working directory
           is used.

       dir...  list of directories containing email  files;  each
           directory is processed recursively.


ENVIRONMENT
       $SPAMDIR, as discussed above.


FILES
       Required:   at  least one directory containing text files.
       Any text file within the directory will be processed.


SEE ALSO
       make_new_database, create_probability_hash


COPYRIGHT
       Copyright (c) 2002, J.B.Ward
       <bward2@users.sourceforge.net>




Expaminator                Nov.29,2002        create_word_hash(1)