Categories
Hosting & Servers

Using Varnish as a CDN

Update – 22/02/15: This site now uses a Varnish Backed CDN. Turns out it was pretty strait forward to implement ðŸ™‚

Varnish is a front-end caching proxy that serves only static content. The way it works is not too far removed from how the top level server operations work with an origin-pull CDN.

A CDN server receives a request for a static file and it delivers it. If the file us on hand then it sends it, if it isn’t available then it pulls the file from the origin and serves that while storing the file it in it’s cache. That’s basically what Varnish does as well.

I’ve had this idea for a while but only recently added a spare Varnish caching proxy to my cluster that I can use for testing.

The primary server will be the one rewriting urls to point to the CDN domains.

I already have a basic statics server set-up and running on a separate box from the main domain. It’s got a push-style propagation method, is configured with long expirations for files and doesn’t accept cookies.

That already gets populated with files so I’ll use that as the origin to keep the busy worker count on the main server down. I want about half the requests to hit the statics server and half to hit the varnish server. The main domain will be the one that handles rewriting to make that happen. I may use the W3 Total Cache plugin to do it through PHP or I might use mod_pagespeed domain sharding through Apache.

Using Pagespeed for it has it’s benefits but it would likely be a much simpler set-up process if done with W3 Total Cache.

The statics server already gets filled with it’s files (initially filled with rsync and kept up to date by W3 Total Cache – rsync runs on cron to make sure the files are always the latest versions). The Varnish server is not primed when it starts like the statics server is. We need to tell Varnish where it can find the files it doesn’t have.

To do that a backend is set that points to the statics server and Varnish gets told that when a request arrives on a specific domain and it doesn’t have the file in it’s cache then it should request from a specific server then cache it and serve from the cache next time.

Once things are set-up and the Varnish cache has been running for a while it’ll serve the function of being a 2nd node in the Content Delivery Network. The benefit of doing it this way is the ability to add extra nodes with ease and they can be put to use like I’ve described, behind load balancers or with GeoIP targeting all with minimal configuration.

That’s the basic idea in a nutshell. In reality things are a little more complicated. There’s on-the-fly optimizations done by the upstream server that get performed by mod_pagespeed and likely a need to create some kind of tweaked LRU eviction function for Varnish so that it’s cache doesn’t fill with multiple versions of the same file at various different optimization stages.

I’ll deal with the problems as I come across them but there’s no harm in testing the idea and building the system. The potential performance increase for a small single site is probably negligible but across an entire network of sites it’s likely to amount to a substantial performance improvement all round.