If you’re using LogMiner to analyse your Apache/IIS logs, here’s an easy recipe to be notified of sites that send visitors your way.
First, let’s define a Pg/PLSQL function to extract the new referrers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | CREATE TYPE new_referrer AS ( site_id int8, site_name text, referrer text, hits INT ); CREATE OR REPLACE FUNCTION get_new_referrers() RETURNS SETOF new_referrer AS $body$ DECLARE rec new_referrer; lastRef int8; BEGIN SELECT INTO lastRef CAST( VALUE AS int8 ) FROM catalog WHERE name = 'LastReferrer'; IF lastRef IS NULL THEN INSERT INTO catalog ( name, VALUE ) VALUES ( 'LastReferrer', 0 ); lastRef := 0; END IF; FOR rec IN EXECUTE 'SELECT s.id AS site_id, s.name AS site_name, r.referrer, tmp.hits ' || 'FROM ( ' || ' SELECT site, referrer, COUNT(*) AS hits ' || ' FROM accesses ' || ' WHERE referrer > ' || lastRef || ' GROUP BY site, referrer ' || ') tmp ' || 'JOIN sites s ON s.id = tmp.site ' || 'JOIN referrers r ON tmp.referrer = r.id ' || 'WHERE r.extern = true ' || 'ORDER BY s.name, tmp.hits DESC, r.referrer' LOOP RETURN NEXT rec; END LOOP; UPDATE catalog SET VALUE = ( SELECT MAX( id ) FROM referrers ) WHERE name = 'LastReferrer'; RETURN; END; $body$ LANGUAGE plpgsql; |
Then, a little PHP script to run the query and format the results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | #!/usr/local/bin/php <?php $lines = array(); $lastSite = ''; // quickest and dirtiest way to execute a query in PHP ;-) exec ( "echo 'SELECT r.* " . "FROM get_new_referrers() r " . "LEFT JOIN search_engines se ON ( r.referrer ~* se.pattern AND r.referrer ~* se.query_pattern )" . "WHERE se.id IS NULL' | /usr/local/pgsql/bin/psql -U logminer -At logminer", $lines ); foreach( $lines as $line ) { $parts = explode( '|', $line, 4 ); if( $lastSite != $parts[ 0 ] ) { printf( " " . "------------------------------------------------------- " . "Referrers for %25s " . "------------------------------------------------------- ", $parts[ 1 ] ); $lastSite = $parts[ 0 ]; } printf( "[%25d] %25s ", $parts[ 3 ], $parts[ 2 ] ); } ?> |
The query above grabs the referrers for every site available in the DB, filtering requests coming from known search engines to reduce the level of “noise”.
This script is finally invoked by the cron job that processes the logs:
1 | referrers_report.php | mail -s 'New referrers report' email@example.com |