Table of Contents
use Zeitgeist; my $reflog = '/var/log/httpd/referer_log'; my $zeitgeist = '/home/me/www/zeitgeist.html'; my $z = new Zeitgeist(); $z->readlogs( files => ["$reflog", "$reflog.1", "$reflog.2"] ); $z->toHTML(">$zeitgeist");
Zeitgeist.pm module can be used to read web server referer logs and parse the referrals from search engines to get the search terms which led people to your site. It then presents the search terms as HTML output with each term randomly colored, and sized according to how many occurrences of the term were found in the referer logs.
The HTML output is simple, and should probably be included into a frame with server side includes.
Much of the text below will stubbornly spell referer as 'referer' and referral as 'referral'. Unless I goof up.
The constructor with no options sets the Zeitgeist instance up to work with the idosyncratic nature of the author's site. These defaults can be changed in the Zeitgeist.pm source file by modifying the DEFAULTS constant.
They can be changed on an instance by instance basis by providing options to the constructor.
$z = new( separator => q( | ), zcat => '/bin/zcat', size => \&my_size_subroutine, reflex => q/zeitgeist/, thisdomain => 'ghoti.net', refpos => 1, targetpos => 3 );
The separator option controls what appears between each term on the page. The zcat option should be set to the path of the zcat binary on your system if you want to automatically decompress log files. (see the readlogs() method below).
The reflex option is used to filter out searches which found the zeitgeist page itself. It should be set to a string that will match the path to your zeitgeist page. Ig you don't finter these out, the zeitgeist page itself will self-referentially spiral out of control, and the resulting singularity could cause minor damage in the vicinity of your web server.
Similarly, the thisdomain option should be set to your own domain, so that the parser skips expensive pattern matching on internal referrals not likely to be of interest.
The refpos and targetpos options are the fields in the referer log file of the referer and the target page. The defaults are set up for a standard Apache referer log.
The size option can be used to change the algorithm which computes the size of the font in the HTML output, and should be a referennce to a subroutine which takes a single parameter, the number of occurrances of the search term, and returns a list containing a number which will be used as the relative size in the font tag for the HTML output, and a boolean value determining whether or not the entry should be bold.
$z->readlogs( file => ["file1", "file2", "file3.gz"] ); $z->readlogs( handle => new FileHandle("/some/hideous/script |"));
The readlogs() method will open multiple log files and parse the search referral terms out of them. Currently all these files must have the same format. If any of the filenames end in '.gz' they will be unzipped before they are read.
For the sake of flexibility, you can also pass readlogs() an already open filehandle, which can be of some script that does more complex parsing.
You can use both the files and handle options in the same call to readlogs(). Also, multiple calls to readlogs() are cumulative; they won't overwrite the data already loaded.
The toHTML() method writes the HTML to the specified file or already open FileHandle object (or any subclass of FileHandle). If you pass it a filename, than it is best that the user running the program have permission to write that file. Similarly, if you pass a FileHandle object, it should be opened for writing. This is all common sense, of course, but I feel compelled to mention it for some reason.
Passing a filename will result in that file being overwritten. If you want to append to the file, open your own FileHandle and pass that in. You can use the FileHandle mode to pass the HTML to some other program.
$z->toXML($filename); $z->toXML($filehandle); $z->toXML($filename, $stylesheet);
The toXML() method is just like the toHTML() method, except that it produces an XML format of the data, which can then be munged with XSLT into whatever sort of HTML you like. The basic format of the XML file is
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="zeitgeist.xsl.xslt"?> <zeitgeist> <search> <term occurrences="1"><![CDATA[suv bumper sticker]]></term> <uri><![CDATA[http://www.google.com/search?q=suv+bumper+sticker]]></uri> <rcolor>990000</rcolor> </search> <!-- Multiple search entries --> </zeitgeist>
If a second argument is provided to the toXML() method, this is treated as the location of an XSL stylesheet, which will be referenced in the XML output.
Download the closest thing I have to it at the Zeitgeist Zeitcode page.