Who’s linking to your site?

If you’re using LogMiner to analyse your Apache/IIS logs, here’s an easy recipe to be notified of sites that send visitors your way.

First, let’s define a Pg/PLSQL function to extract the new referrers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
CREATE TYPE new_referrer AS (
    site_id         int8,
    site_name       text,
    referrer        text,
    hits            INT
);

CREATE OR REPLACE FUNCTION get_new_referrers()
RETURNS SETOF new_referrer AS $body$
    DECLARE
        rec         new_referrer;
        lastRef     int8;
    BEGIN

        SELECT INTO lastRef CAST( VALUE AS int8 )
        FROM catalog
        WHERE name = 'LastReferrer';

        IF lastRef IS NULL THEN
            INSERT INTO catalog ( name, VALUE ) VALUES ( 'LastReferrer', 0 );
            lastRef := 0;
        END IF;

        FOR rec IN
            EXECUTE 'SELECT s.id AS site_id, s.name AS site_name, r.referrer, tmp.hits ' ||
                    'FROM ( ' ||
                    '   SELECT site, referrer, COUNT(*) AS hits ' ||
                    '   FROM accesses ' ||
                    '   WHERE referrer > ' || lastRef ||
                    '   GROUP BY site, referrer ' ||
                    ') tmp ' ||
                    'JOIN sites s ON s.id = tmp.site ' ||
                    'JOIN referrers r ON tmp.referrer = r.id ' ||
                    'WHERE r.extern = true ' ||
                    'ORDER BY s.name, tmp.hits DESC, r.referrer'
        LOOP

            RETURN NEXT rec;

        END LOOP;

        UPDATE catalog
        SET VALUE = ( SELECT MAX( id ) FROM referrers )
        WHERE name = 'LastReferrer';

        RETURN;
    END;
$body$ LANGUAGE plpgsql;

Then, a little PHP script to run the query and format the results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/local/bin/php
<?php

        $lines    = array();
        $lastSite = '';

        // quickest and dirtiest way to execute a query in PHP ;-)
        exec ( "echo 'SELECT r.* " .
                   "FROM get_new_referrers() r " .
                   "LEFT JOIN search_engines se ON ( r.referrer ~* se.pattern AND r.referrer ~* se.query_pattern )" .
                   "WHERE se.id IS NULL' | /usr/local/pgsql/bin/psql -U logminer -At logminer",
                  $lines );

        foreach( $lines as $line ) {

                $parts = explode( '|', $line, 4 );

                if( $lastSite != $parts[ 0 ] ) {

                        printf( "



"
.
                                        "-------------------------------------------------------
"
.
                                        "Referrers for %25s
"
.
                                        "-------------------------------------------------------

"
,
                                        $parts[ 1 ] );

                        $lastSite = $parts[ 0 ];
                }

                printf( "[%25d] %25s

"
, $parts[ 3 ], $parts[ 2 ] );
        }
?>

The query above grabs the referrers for every site available in the DB, filtering requests coming from known search engines to reduce the level of “noise”.

This script is finally invoked by the cron job that processes the logs:

1
referrers_report.php | mail -s 'New referrers report' email@example.com

Leave a Reply

Your email address will not be published. Required fields are marked *