create_word_hash(1) Expaminator create_word_hash(1) NAME create_word_hash SYNOPSIS create_word_hash [-v] [-d] [-f] [-m] hashfile dir... DESCRIPTION create_word_hash creates a dictionary of words found in all textfiles in the specified list of directories, and assigns to each word the number of files (or messages; see -m) in which that word was found. The dictionary is in the form of a Berkeley DB hash, specified by 'hashfile'. Currently, the following characters, in addition to the ASCII 'space' and 'tab' characters, are used as delimiters when parsing, and therefore are not considered to be parts of words : @#%^&*()_+=|{}[]:;"'<>,?/ Command-line options: -d write debugging messages (the name each directory & file processed) -f 'force'; if 'hashfile' already exists, delete & replace it. -m assume email files could be in sendmail (same as unix 'mail') 'mbox' format. (These may contain multiple messages, each starting with a "^From " line) -h Help; print the command-line options and exit. -v be verbose; print a dot for every message file pro cessed. hashfile - pathname of words/frequency-of-occurence hash to be created. If 'hashfile' is a bare file name, then the environment variable '$SPAMDIR' will be prepended. If 'hashfile' is a bare file name, and '$SPAMDIR' is not set, the current working directory is used. dir... list of directories containing email files; each directory is processed recursively. ENVIRONMENT $SPAMDIR, as discussed above. FILES Required: at least one directory containing text files. Any text file within the directory will be processed. SEE ALSO make_new_database, create_probability_hash COPYRIGHT Copyright (c) 2002, J.B.Ward <bward2@users.sourceforge.net> Expaminator Nov.29,2002 create_word_hash(1)