Varnish + logging + geoip howto

Varnish is the new buzzword making the rounds on the “speed up your website circuit. Does it work? Well suffice to say YMMV…. earlier there was Squid as well as Apache (in reverse proxy mode) so the difference will probably be in how it handles dynamic content and the resources it eats up.

So this one is about how to get geo targetting to work when you have a proxy software caching your static content (and also dynamic content if you set headers in your scripts properly). The way a proxy works is that it sits between your webserver and the visitor. It keeps a store of objects (images, static content etc.) and instead of over burdening your webserver with requests it frees up the task of serving content (or atleast reduces the frequency of requests). All good and happy happy specially for the server admin and moderately for the visitor as well coz response time is slightly reduced.

One of the problems you would face is when you look at your webserver access logs and try to figure out where your traffic is coming from. Since all your traffic is actualy now coming from the proxy you have just one IP in the logs, that of your varnish install. Here you would be advised to change the logformat of your access logs. The way to do that for apache is to set the log format like such:

LogFormat '%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"' combined

Then make sure that the access log format is set to combined and restart apache. This will start logging the visitors IP instead of the proxy servers address in the access logs and then you can proceed to use your statistics programs as in the pre-proxy era.

For GeoIP also you need to make a few changes. What I use is the php geoip.inc method (keep switching between litespeed, lighttpd, apache depending on what I’m researching at the moment) so instead of mod_geoip the include method works best for me.
So a small change to the code:


include ("geoip.inc");
$headers = apache_request_headers();
$ip = $headers["X-Forwarded-For"];
$gi = geoip_open("/usr/local/share/GeoIP/GeoIP.dat",GEOIP_STANDARD);
$country_code=geoip_country_code_by_addr($gi,$ip);
echo $country_code;

This is using the geoip.inc and the php geoip extension from maxmind.

If you need help with varnish, website tuning, geo targetting or any thing related to linux webserver other than this please contact us

Some things to look at in geoip.inc (if you get the cannot redeclare errors)
replace these two function definitions.

if (!function_exists('geoip_country_code_by_name')){
function geoip_country_code_by_name($gi, $name) {
$country_id = geoip_country_id_by_name($gi,$name);
if ($country_id !== false) {
return $gi->GEOIP_COUNTRY_CODES[$country_id];
}
return false;
}
}
if (!function_exists('geoip_country_name_by_name')){
function geoip_country_name_by_name($gi, $name) {
$country_id = geoip_country_id_by_name($gi,$name);
if ($country_id !== false) {
return $gi->GEOIP_COUNTRY_NAMES[$country_id];
}
return false;
}
}

In the vcl for your varnish instance handling the website add this:

sub vcl_pass {

set bereq.http.connection = "close";

if (req.http.X-Forwarded-For) {

set bereq.http.X-Forwarded-For = req.http.X-Forwarded-For;

} else {

set bereq.http.X-Forwarded-For = regsub(client.ip, ":.*", "");

}

}

mod_geoip howto

wget http://geolite.maxmind.com/download/geoip/api/c/GeoIP-1.4.6.tar.gz
tar -zxf GeoIP-1.4.6.tar.gz
cd GeoIP-1.4.6
./configure
make
make install
cd ..
tar -zxf mod_geoip2_1.2.4.tar.gz
cd mod_geoip2-1.2.4/
apxs -i -a -L/usr/local/lib -I/usr/local/include -lGeoIP -c mod_geoip.c
vi /etc/httpd/conf/httpd.conf

GeoIPEnable On
GeoIPDBFile /usr/local/share/GeoIP.dat

scp imtiaz@netbrix.net:/home/imtiaz/src/GeoIP.dat /usr/local/share/

/etc/init.d/httpd stop
/etc/init.d/httpd start

mails being rejected at aol or yahoo or gmail?

If you send mails using a webform using the php mail() function and your mails are being rejected by the destination servers here’s a small trick to fix the issue.

Usually this arises when your envelop does not carry a valid identifiable from header. So instead of using something like:

mail($to, $subject, $message, $headers);
Where you are not explicitly setting the sender address try something like this:
mail($to, $subject, $message, $headers,"-femail@domain");

This would set the correct from address in your mail envelop. Usually the address is picked up from php.ini on the server and if not mentioned it is set to the webuser@hostname

It’s a good practise to set the fifth field on your mail() function as it prevents confusion at the recipient end.

Squeaky clean virus free websites — on shared hosts

Hackers or script kiddies often end up defacing and hijacking your website. They can do this because you are not looking! Or not loking hard enough.

An antimalware/antivirus scan for linux would be a good thing as it would allow you to scan files on your server on a periodic basis(using cronjobs). Mostly backdoors are uploaded using upload forms that are made available for genuine users to upload stuff like pictures, attachments and other content. These forms, if badly written, or if there is no check for execution of code in the webroot can pretty much wipe out the server. So you need to be proactive and check your files as often as possible and clean them before things get out of hand.

Clamav is a good antivirus/malware scanner. Set it up to do your file watch job. Here’s the process to get it going on dreamhost.

  • svn co http://svn.clamav.net/svn/clamav-devel/trunk/
  • ./configure --prefix=$HOME --user=yourusername --group=yourgroupname
  • make & make install
  • edit /home/yourusername/etc/freshclam.conf and comment out the 4the line as instructed
  • edit /home/yourusername/etc/clamd.conf same edits as earlier for clamd
  • run freshclam and then clamdscan $HOME to find any backdoor or virus payload on your website.

Contact us if you need asistance with reclaiming your hacked website/server as well as other linux server admin requirements

keep your server clock ticking right

Time is money!
It’s important, from a lot of points of views besides security, ot keep the server time correct. The tool to use is ntpdate which syncs server time to a NTP server (Network Time Protocol)

Put this in your crontab if your server is in the NA vicinity
5 0 * * * /usr/sbin/ntpdate -u 0.north-america.pool.ntp.org

This will sync the clock to the NTP server at five past midnight.

For a complete list of which servers to use check here