Building an Ad System for a Newspaper Web Site

When we launched the Philadelphia Newspapers web site, we needed to sell advertising. That is how newspapers pay for things. But there was no off-the-shelf software for serving ads on the web. No ad servers, no banner networks, nothing. If you wanted online advertising on your site, you had to build it yourself.

I wrote two scripts to handle this: go-to-ad.pl for tracking ad clicks, and rotate_ads for rotating different ads into the same slot on a page.

The click tracker is a CGI program. Instead of linking directly to an advertiser’s URL, you link through the CGI: /cgi-bin/go-to-ad?http://advertiser.com&image=banner.gif. When someone clicks, the script logs the click and then sends an HTTP redirect to the advertiser’s site. The reader barely notices the redirect.

The logging approach avoids database overhead entirely. Instead of inserting rows into a database, the script writes to flat files. The file name is derived from the ad URL or image name, and a new file starts each month. So all clicks for a particular advertiser in January 1996 go into one file, February into another. This means you can analyze the data later on a different machine without worrying about locking a shared database during peak traffic.

The script does use file locking (flock) to handle the case where two readers click the same ad at the same moment. It locks the file, seeks to the end, writes the log line, and unlocks. Each log line records the date, time, referring page, browser, and the reader’s IP address.

# The name of the log file is the Image name, if specified.
# If not, the URL is used.

($LogFile = ($Image or $URL)) =~ s{^\w+\://}{} ;

# Converting slashes to underscores because a unix file name can not
# contain any slashes.

$LogFile =~ tr[/][_] ;

The ad rotation script is simpler. It is called from server-parsed HTML (.shtml files) using a server-side include. Each time a page loads, the include calls rotate_ads, which picks an ad from the available pool and outputs the HTML for it. The actual selection logic lives in a library file (ad_choose_and_show.pl) so the rotation algorithm can be changed without touching every page.

This is a basic system compared to what will probably exist in a few years. But it works, it handles real traffic, and it gives the advertising department the numbers they need: how many times was each ad shown, how many people clicked, what pages were they on when they clicked. That is enough to sell ads and report results.

Like my earlier work reading GIF dimensions in Perl to automate image handling, this system is built to remove manual steps from web publishing. The whole thing runs on a Unix box with Perl and a standard web server. No special hardware, no database server, no proprietary software. The flat-file logging was a deliberate choice. For the volume of traffic we get, it is more reliable than introducing a database dependency into the critical path of serving a page.

I have made the source code freely available. If you are running a web site and need basic ad tracking, you can take these scripts and modify them for your setup.

Here is the full source code for the click-through logger:

#!/usr/local/bin/perl
# Program name: go-to-ad
# Installed in the CGI directory.

# Purpose:
#       This is a "click-through" CGI program that sends the browser to
# an advertiser's site.

# Optional Feature:
#       Before it does that, it can log certain information.
# The log file name is based on the url or image name passed to it.
# This avoids the overhead of dealing with a database. The data
# analysis and lookup can be done on a different machine later.

# Written by: Rajiv Pant (Betul)  [email protected]  http://rajiv.org

# Original Version: 1.0 1995/Dec
# Current Version: 1.1 1996/Jun

# Current status: Replaced by a more efficient Server API version.

use lib '/my/perl/lib/where/date/and/time/module/is/kept' ;

use Date_Time ; # Perl object package written by Betul.
                # Freely available at http://rajiv.org


$AdLogFolder = '/inet/data/logs/advertisers' ;

$ImageExtensions = '(gif|jpg|jpeg)' ;

$Date = new Date_Time ;

($URL, $Image) = split '&image=', $ENV{'QUERY_STRING'} ;



$ToBeLogged = 0 ;


if ($ToBeLogged)
{

# The name of the log file is the Image name, if specified.
# If not, the URL is used. If the URL is used, then
# removing the initial http:// or https:// part of the URL
# for the log file name since over 99% of sites are http:// anyway.

($LogFile = ($Image or $URL)) =~ s{^\w+\://}{} ;


# Converting slashes to underscores because a unix file name can not
# contain any slashes.

$LogFile =~ tr[/][_] ;


# Removing the .GIF or .jpeg extension from the end of the file.
# We don't expect an advertiser's URL to end in an image.

$LogFile =~ s/\.$ImageExtensions$//i ;


# We start a new log file every month.

$LogFile = "$AdLogFolder/$LogFile." . $Date->year . $Date->month ;

open (LOGFILE, ">>$LogFile") ; # Disabled for now.

# Locking the log file so that another instance of this program
# or some other program wanting to open the same file has to wait
# until this instance unlocks it.

flock LOGFILE, 2 ; # 2 Means lock with exclusive rights on the file.

# Now we seek to the end of the file in case our previous lock
# request had to wait for another program to complete its work
# and unlock the file.

seek LOGFILE, 0, 2 ;


print LOGFILE

'DATE=',		$Date->year. $Date->month. $Date->day,	"\t",
'TIME=',		$Date->time_format_1,			"\t",
'HTTP_REFERER=',	$ENV{'HTTP_REFERER'},			"\t",
'HTTP_USER_AGENT=',	$ENV{'HTTP_USER_AGENT'},		"\t",
'REMOTE_ADDR=',		$ENV{'REMOTE_ADDR'} ;


# Unlocking the file.

flock LOGFILE, 8 ; # 8 Means unlock the file.

close (LOGFILE) ;

} # end if ToBeLogged


# Redirecting the browser to go to the advertiser's URL specified.

print "Location: $URL\n\n" ;

And here is the ad rotation script:

#!/usr/local/bin/perl
# rotate_ads
# Rajiv Pant (Betul)

require '/inet/cgi/lib/ad_choose_and_show.pl' ;

print "Content-type: text/html\n\n" ;

&ad_choose_and_show ($ARGV [1], $ARGV[0]) ;