pwebstats: configuring and running

Table of Contents

  1. How to use pwebstats.
  2. The Configuration File.
  3. Auxiliary Programs.

Using pwebstats

pwebstats creates a collection of html files and images in a group of directories under the output directory specified below. At present, pwebstats produces output statistics over daily, weekly or monthly periods. Note: the input logfile(s) must be split into separate daily, weekly or monthly files. A utility (log-splitter.pl) has been included to assist you in doing so.

  1. Edit the configuration file (./conf/pwebstats.conf) to reflect the location of the pwebstats distribution directory on your system, the location of the output directory, and your site-specific details.

  2. Some config details can also be given on the command line. Type ./pwebstats for a full list of options.

  3. Type the following command in the distribution directory to run pwebstats:
    ./pwebstats -c conf/pwebstats.conf
    The output will go in the directory specified in the config file or on the command line.

  4. If you want page-specific stats generated, have a look at the file ./conf/pwebstats.pages which is the configuration file for the page-based part of pwebstats. The format is a series of colon-separated directives in the format:
    PERL pattern for html collection (or just path to html file):
    Description of file or collection:Level of Indentation (for subsections):
    URL to the page itself (to create an active link)

    Some examples are available.

    I've made copy of our local page-config file and a copy of the pwebstats output for that page, available so you can get an idea of what can be done with page-based stats.

    See http://www.its.unimelb.edu.au/manuals/perl5/perlre.html for details on perl regular expressions.

The Configuration File

The configuration file (./conf/pwebstats.conf) controls the setup information needed for pwebstats to run, and the user-settable limits and variables.

Lines starting with a # are comments and are ignored, as are blank lines.

All other lines are of the form variable:setting (the colon is necessary).

Use full pathnames where pathnames are to be specified (no trailing '/').

Config Variables

server
Unique nickname for server - use only a-z, A-Z and '_'.
Server_header
Header for index page.
logfile
Location of log file (full pathname).
logtype
Type of logfile.
Acceptable values are: common (Common Log Format), squid, squid-emulated, ncsa-extended, and netscape-proxy. Defaults to common.
outdir
Directory location for the output of pwebstats (full pathname).
templates
directory containing GIF templates (full pathname).
interval
Stats collection interval - can be daily, weekly, monthly, quarterly.
verbose
Verbose output - progress bar and other details when pwebstats is running (any value = on).
fly_prog
Location of 'fly' program (full pathname).
page_config
Location of page-based stats config file (full pathname).
host_threshold
Threshold for inclusion in all hosts list (default = 25).
item_threshold
Threshold for inclusion in all requests list (default = 25).
domain_threshold
Threshold for inclusion in all domains list (default = 5).
protocol_threshold
Threshold for inclusion in all protocols list (default = 25).
local_patt
Regular expression for local domain
e.g.: local_patt:\.unimelb\.edu\.au$|^128\.250|\.mu\.oz\.au$
exclude
regexp of items to exclude from display in request stats (but are still counted in totals)
complete_exclude_host
completely ignore access from this set of hostnames ( | is the delimeter)
e.g. complete_exclude_host:foo1.users.bar.com|foo2.users.bar.com|foo3.users.bar.com
complete_exclude_url_patt
completely ignore access to this pattern of URLs
e.g. complete_exclude_url_patt:^/foo/bar/*$|^/robots.txt$
complete_exclude_user
completely ignore access from this set of users ( | is the delimeter)
e.g. complete_exclude_user:tom|dick|harry
dns_lookup
Convert IP numbers in the hostname field to fully-qualified domain names (any value = on).

An example config file.

Additionally, in a configuration file for a proxy server, the following directives are applicable:

remote_host_threshold
Threshold for inclusion in all remote hosts list (default = 25)
exclude_reqs
Exclude requests/accesses array - saves time and a lot of memory! (any value = on)

Auxiliary programs

The following extra programs and scripts are included in the pwebstats distribution, in the utilities directory.

log-splitter.pl
This will split an existing log file into weekly or monthly files for input to pwebstats. Type ./log-splitter.pl for usage information.
rotatelogs.sh
Handy utility for rolling over logfiles, restarting the server and general cleaning-up.
ns-proxy-splitter.pl
This will split a Netscape Proxy extended log file into CERN-style proxy and cache logs.
run-up.sh
Simple shell script to feed all your old weekly/monthly logs into pwebstats. If you just have one big logfile, run it through log-splitter.pl first.

[Valid XHTML 1.0!]