Some time ago there was this 'googlebot' PHP script circulating on the net that sends you an email every time Google's crawler is checking your page. However, after some time the hype wave went down – if you own a blog which gets updated regularly, Googlebot will check your pages for changes some times twice a day. Do you really want to be told every time it happens, for every page of your website?

So here is my version of googlebot script that generates a plain text log to notify you in a less obtrusive way. I've also included a check for robots of other search engines besides Google (namely Yahoo!, MSN and Russian engine, Yandex).

You can either paste this code snippet directly to your template or save it to a separate file and use 'include' function like this:

          <?php include('/path/to/your/blog/on/server/gb.php'); ?>

        

However, your log will grow quite large in a short period of time and so here is a complimentary script that will generate an RSS feed showing only the latest entries:

          <?php
$siteurl="http://www.your.url/blog/";
// Path to log file
$sitepath = "/path/to/your/blog/on/server/"; // Note the trailing slash!
// Log file name
$logfile=$sitepath."googlebot.log";
// Number of entries to display
$num=10;

// Send a proper HTTP heading for an RSS file
header('Content-type: text/xml');
echo "<?xml version="1.0" ?>n";
?>

<rss version="2.0">
  <channel>
   <title>Search engine watch</title>

   <link><?php echo $siteurl; ?></link>
   <description />
   <language>en-us</language>

<?php
// Attempt to open log file
$handle = file($logfile);
// Skip to the last $num of lines
$start = count($handle)-$num;
if ($start < 1) $start = 1;
for ($i = $start; $i < count($handle); $i++) {
   $data = explode(";", str_replace('"','',$handle[$i]));
   echo "<item>n";
   echo "<pubDate>" . date("r",strtotime($data[0])) . "</pubDate>n";
   echo "<title>" . $data[1] . "</title>n";
   echo "<link>" . htmlspecialchars($data[2], ENT_QUOTES) . "</link>n";
   echo "</item>n";
}
// Close log file
fclose($handle);
?>

 </channel>
</rss>

        

Save it in a separate file on your server and then simply open its URL in your browser or aggregator. The result can be seen in a Googlebot RSS feed for my blog.

P.S. You can also integrate this feed with your site by using the technique described in my previous article, 'Integrating news feeds using Magpie RSS'.