by Nate Haug on April 5, 2011 // Short URL

Configuring Varnish for High-Availability with Multiple Web Servers

Varnish is a very popular software package that can dramatically accelerate the work of serving HTTP pages. Varnish caches fully-rendered responses to HTTP requests and serves them without the delay of building content from scratch. Because it's so much more efficient than building a page with Apache and Drupal, Lullabot regularly deploys Varnish when a site needs to handle high levels of anonymous traffic. In many situations, installing Varnish on the same machine as MySQL and Apache can help squeeze more performance out of a single box. However in most of our deployments, we're working with "high availability" setups, where dedicated servers handle different functions and redundant backup servers are on hand for every piece of the infrastructure. That means two database servers, two web servers, two Varnish servers, and so on.

A typical Drupal high-availability setup with Varnish.

This article is all about configuring Varnish optimally for these high-availability setups, in which multiple dedicated back-end servers are protected from heavy traffic by Varnish serving cached content to the outside world. We also have some neat tricks for server maintenance, optimizing your cached content, and configuring Varnish to act as a fail-safe even if all of your back-end servers go down.

Basic Varnish Configuration

Varnish usually has three locations of configuration. The boot script, the system-wide configuration, and the VCL file that does most of the work.

The first script that starts up Varnish is usually located with the rest of your system startup scripts at /etc/init.d/varnish. This file rarely needs adjustments, but it can be interesting to read or help you locate further configuration (since the startup script is responsible for calling the next file).

The second file is usually located at /etc/sysconfig/varnish (on CentOS and RedHat machines) or /etc/default/varnish (on Ubuntu). This file defines global configuration for Varnish such as which port it should run on and where it should store its cache. Typically it contains 5 different ways of writing the same thing. It doesn't matter which option you use: just be sure to change your storage backend. The default is usually "file", which stores cached information on disk. Be absolutely sure to change this to "malloc", which stores information in memory! If you don't have enough memory in your box for a decent sized cache (say a few gigabytes), consider a memory upgrade.

Here's a configuration that we have running on a very popular site with a large number of images and pages being cached (this is using the "Option 2" in the /etc/sysconfig/varnish file):

DAEMON_OPTS="-a :80,:443 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -u varnish -g varnish \
             -S /etc/varnish/secret \
             -p thread_pool_add_delay=2 \
             -p thread_pools=<Number of CPU cores> \
             -p thread_pool_min=<800 / Number of CPU cores> \
             -p thread_pool_max=4000 \
             -p session_linger=50 \
             -p sess_workspace=262144 \
             -s malloc,3G"

The last line is the most important to set up. In this case we're allocating 3GB of memory for Varnish's dedicated use. Also note the paths used in this file that reference which VCL file you will use. It's probably best to stick with whatever file path your distribution uses. The lines above for <Number of CPU cores>, be sure to replace with actual server information. You can get information about the number of processors in your machine by running grep processor /proc/cpuinfo, which will return a line for each processor core you have available.

VCL Configuration

The VCL file is the main location for configuring Varnish and it's where we'll be doing the majority of our changes. It's important to note that Varnish includes a large set of defaults that are always automatically appended to the rules that you have specified. Unless you force a particular command like "pipe", "pass", or "lookup", the defaults will be run. Varnish includes an entirely commented-out default.vcl file that is for reference.

We'll be going through each of the sections individually down below, but for ease of reading here's a complete copy of the VCL file that we're currently using, and a copy of the defaults that Varnish will automatically append to our own rules.

Updated 4/9/2012: I've added an updated VCL and examples here for Varnish 3, which differs slightly from the Varnish 2 configuration.

Now let's get started walking through the most interesting stuff, the VCL!

Health Checks and Directors

A more recent feature of Varnish (2.x and higher) has been the addition of "directors" to send traffic to any number of web servers. These directors can do regular health checks on each web server to see if the server is still running smoothly. If Varnish receives a request for an asset that it hasn't yet cached, it will only pass on the request to a "healthy" server.

Our VCL file is typically set up to handle both HTTP and HTTPS traffic, so we need to define a list of web servers by their IP address twice; once for port 80 which provides normal pages and again for port 443 for secure connections. If you've set up Apache on a different port, reference that port here.

# Define the list of backends (web servers).
# Port 80 Backend Servers
backend web1 { .host = "192.10.0.1"; .probe = { .url = "/status.php"; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}
backend web2 { .host = "192.10.0.2"; .probe = { .url = "/status.php"; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}

# Port 443 Backend Servers for SSL
backend web1_ssl { .host = "192.10.0.1"; .port = "443"; .probe = { .url = "/status.php"; .interval = 5s; .timeout = 1 s; .window = 5;.threshold = 3; }}
backend web2_ssl { .host = "192.10.0.2"; .port = "443"; .probe = { .url = "/status.php"; .interval = 5s; .timeout = 1 s; .window = 5;.threshold = 3; }}

# Define the director that determines how to distribute incoming requests.
director default_director round-robin {
  { .backend = web1; }
  { .backend = web2; }
}

director ssl_director round-robin {
  { .backend = web1_ssl; }
  { .backend = web2_ssl; }
}

# Respond to incoming requests.
sub vcl_recv {
  # Set the director to cycle between web servers.
  if (server.port == 443) {
    set req.backend = ssl_director;
  }
  else {
   set req.backend = default_director;
  }
}

The last part of the configuration above is part of our vcl_recv sub-routine. It's what defines which set of servers will be used based on the port by which Varnish received the traffic.

It's a good idea then to set up a reasonable health check on each web server to make sure that server is ready to deliver traffic. We use a file located directly in the root of the web server called "status.php" to check if the web server is healthy. This file does a few checks including:

  • Bootstrapping Drupal
  • Connecting to the Master database
  • Connecting to the Slave database (if any)
  • Connecting to Memcache (if in use)
  • Checking that the files directory is accessible

If any of these checks fail, the file throws a 500 server error and Varnish will take the web server out of rotation. The status.php file is also extremely useful for intentionally taking a web server out of rotation. Simply move the status.php file to a new location (like status-temp.php) and Varnish will automatically remove the server from rotation while the server itself stays up so that it may be serviced independently of the other web servers. This approach is common when performing upgrades or installations of new software on the web servers.

While our status.php script is fairly universal, it may require some tweaking for your own purposes. If you have additional services that are required for a server to function properly, adding additional checks to the script will ensure Varnish doesn't cache bad data from a broken server.

Caching Even if Apache Goes Down

Even in an environment where everything has a redundant backup, it's possible for the entire site to go "down" due to any number of causes. A programming error, a database connection failure, or just plain excessive amounts of traffic. In such scenarios, the most likely outcome is that Apache will be overloaded and begin rejecting requests. In those situations, Varnish can save your bacon with the Grace period. Apache gives Varnish an expiration date for each piece of content it serves. Varnish automatically discards outdated content and retrieves a fresh copy when it hits the expiration time. However, if the web server is down it's impossible to retrieve the fresh copy. "Grace" is a setting that allows Varnish to serve up cached copies of the page even after the expiration period if Apache is down. Varnish will continue to serve up the outdated cached copies it has until Apache becomes available again.

To enable Grace, you just need to specify the setting in vcl_recv and in vcl_fetch:

# Respond to incoming requests.
sub vcl_recv {
  # Allow the backend to serve up stale content if it is responding slowly.
  set req.grace = 6h;
}

# Code determining what to do when serving items from the Apache servers.
sub vcl_fetch {
  # Allow items to be stale if needed.
  set beresp.grace = 6h;
}

Both of these settings can be the same, but the setting in vcl_fetch must be longer than the setting in vcl_recv. Think of the vcl_fetch grace setting as "the maximum time Varnish should keep an object". The setting in vcl_recv on the other hand defines when Varnish should use a stale object if it has one.

Just remember: while the powers of grace are awesome, Varnish can only serve up a page that it has already received a request for and cached. This can be a problem when you're dealing with authenticated users, who are usually served customized versions of pages that are difficult to cache. If you're serving uncached pages to authenticated users and all of your web servers die, the last thing you want is to present them with error messages. Instead, wouldn't it be great if Varnish could "fall back" to the anonymous pages that it does have cached until the web servers came back? Fortunately, it can -- and doing this is remarkably easy! Just add this extra bit of code into the vcl_recv sub-routine:

# Respond to incoming requests.
sub vcl_recv {
  # ...code from above.

  # Use anonymous, cached pages if all backends are down.
  if (!req.backend.healthy) {
    unset req.http.Cookie;
  }
}

Varnish sets a property req.backend.health if any web server is available. If all web servers go down, this flag becomes FALSE. Varnish will strip the cookie that indicates a logged-in user from incoming request, and attempt to retrieve an anonymous version of the page. As soon as one server becomes healthy again, Varnish will quit stripping the cookie from incoming requests and pass them along to Apache as normal.

Making Varnish Pass to Apache for Uncached Content

Often when configuring Varnish to work with an application like Drupal, you'll have some pages that should absolutely never be cached. In those scenarios, you can easily tell Varnish to not cache those URLs by returning a "pass" statement.

# Do not cache these paths.
if (req.url ~ "^/status\.php$" ||
    req.url ~ "^/update\.php$" ||
    req.url ~ "^/ooyala/ping$" ||
    req.url ~ "^/admin/build/features" ||
    req.url ~ "^/info/.*$" ||
    req.url ~ "^/flag/.*$" ||
    req.url ~ "^.*/ajax/.*$" ||
    req.url ~ "^.*/ahah/.*$") {
     return (pass);
}

Varnish will still act as an intermediary between requests from the outside world and your web server, but the "pass" command ensures that it will always retrieve a fresh copy of the page.

In some situations, though, you do need Varnish to give the outside world a direct connection to Apache. Why is it necessary? By default, Varnish will always respond to page requests with an explicitly specified "content-length". This information allows web browsers to display progress indicators to users, but some types of files don't have predictable lengths. Streaming audio and video, and any files that are being generated on the server and downloaded in real-time, are of unknown size, and Varnish can't provide the content-length information. This is often encountered on Drupal sites when using the Backup and Migrate module, which creates a SQL dump of the database and sends it directly to the web browser of the user who requested the backup.

To keep Varnish working in these situations, it must be instructed to "pipe" those special request types directly to Apache.

# Pipe these paths directly to Apache for streaming.
if (req.url ~ "^/admin/content/backup_migrate/export") {
  return (pipe);
}

Finally, while we're discussing paths that need exceptions, Varnish is also a good place to restrict access to specific URLs. Because the VCL file is so flexible, it can be a good place to lock down paths that should never be seen by the outside world.

Up at the very top of our VCL file, we have a line that defines an access control list of IP addresses. These addresses are considered to be "internal" to our environment. A common example is restricting public access to Drupal's cron.php file so that only local web servers can trigger expensive tasks like search indexing. Since the local web servers all have IP addresses that begins with "192.10.", they are granted access while all others receive an access denied message

At the top of the default.vcl file:

# Define the internal network subnet.
# These are used below to allow internal access to certain files while not
# allowing access from the public internet.
acl internal {
  "192.10.0.0"/24;
}

An then inside of vcl_recv:

# Respond to incoming requests.
sub vcl_recv {
  # ...code from above.

  # Do not allow outside access to cron.php or install.php.
  if (req.url ~ "^/(cron|install)\.php$" && !client.ip ~ internal) {
    # Have Varnish throw the error directly.
    error 404 "Page not found.";
    # Use a custom error page that you've defined in Drupal at the path "404".
    # set req.url = "/404";
  }
}

Optimizing Varnish's Cache

First, let's dissect a very popular but misguided configuration that's made the rounds on the internet. When describing how to serve cached content to users with cookies, a number of sources recommended this solution:

# Routine used to determine the cache key if storing/retrieving a cached page.
sub vcl_hash {
  # Do NOT use this unless you want to store per-user caches.
  if (req.http.Cookie) {
    set req.hash += req.http.Cookie;
  }
}

Generally, this is not a useful approach unless you're serving up the same page to a single user repeatedly. It will recognize the unique cookie that Drupal gives to every logged in user, and use it to keep cached content for one user from being displayed to another. However, this approach is a waste: Drupal explicitly returns a Cache-Control header for all authenticated users that prevents Varnish from caching their content:

Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0

In other words, don't use this approach unless you have an explicit reason to cache authenticated pages. In most situations, this approach will add overhead without caching anything. In the worst case scenario, it will fill up your cache with entries for each authenticated user and push out more valuable anonymous pages that can be reused for thousands of visitors.

In most cases, the best approach is to maintain a single cache that is used for all users. Any content that cannot be served to all users can be passed through to Apache. Because Drupal uses a cookie to indicate the account of a logged in user, the easiest way to spot requests that need fresh, un-cached content is ignore requests with cookies. However, that cookie will also be added to requests for images, JavaScript files, CSS files, and other supporting media assets.

Although the HTML page itself is likely to change for logged in users, there's no reason that Varnish can't serve up cached versions of the support assets. Taking this into consideration, it's a good idea to discard any cookies that are sent by the browser when requesting such files:

# Respond to incoming requests.
sub vcl_recv {
  # ...code from above.

  # Always cache the following file types for all users.
  if (req.url ~ "(?i)\.(png|gif|jpeg|jpg|ico|swf|css|js|html|htm)(\?[a-z0-9]+)?$") {
    unset req.http.Cookie;
  }
}

So far, so good. Stripping cookies from requests for static files allows them to be cached for both anonymous and authenticated users. While it works fine for out-of-the-box Drupal sites, however, there are unfortunately quite a few other ways that cookies can be set on your site. The most common culprits are statistics tracking scripts (like Google Analytics) and advertising servers. Ad scripts in particular have a terrible habit of setting cookies through JavaScript. For the most part, Varnish and Drupal are not concerned at all with these cookies, but since any cookie passed by the browser will cause Varnish to pass the request to Apache, we need to take care of them.

There are multiple approaches to handling this problem, and most administrators start by trying to build a "blacklist" of cookies to strip out from the request, leaving only the ones in which they have interest. This usually results in a configuration file that look something like this:

// Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");

This approach will usually work for a short period of time, but as soon as an ad script or some new piece of JavaScript adds a cookie (like Comment module, Flag module, or any of many other modules), Varnish will cease to cache the page. You'll have to track down the new cookie and add it to the blacklist manually.

Instead, we use an "inclusion" list, where all cookies but a few will be automatically stripped from the request. This logic is a lot more verbose, but it is definitely a more sustainable solution:

# Respond to incoming requests.
sub vcl_recv {
  # ...code from above.

  # Remove all cookies that Drupal doesn't need to know about. ANY remaining
  # cookie will cause the request to pass-through to Apache. For the most part
  # we always set the NO_CACHE cookie after any POST request, disabling the
  # Varnish cache temporarily. The session cookie allows all authenticated users
  # to pass through as long as they're logged in.
  if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|NO_CACHE)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
      # If there are no remaining cookies, remove the cookie header. If there
      # aren't any cookie headers, Varnish's default behavior will be to cache
      # the page.
      unset req.http.Cookie;
    }
    else {
      # If there are any cookies left (a session or NO_CACHE cookie), do not
      # cache the page. Pass it on to Apache directly.
      return (pass);
    }
  }
}

Once you've tamed cookies, there's one other "enemy" of caching you need to plan for: the "Accept-Encoding" header sent by different browsers. Each browser sends information to the server about what kind of caching mechanisms it supports. All modern browsers now support "gzip" compression, but they all inform the server that they support it in different ways. For example, the header of modern browsers will report Accept-Encoding the following ways:

Firefox, IE: gzip, deflate
Chrome: gzip,deflate,sdch
Opera: deflate, gzip, x-gzip, identity, *;q=0

In addition to the headers sent by the browser, Varnish must also pay attention to the headers sent by Apache, which usually include lines like this:

Vary: Accept-Encoding

This means that Varnish will store a different cache for every version of the "Accept-Encoding" header it receives from different browsers! That means you'll be maintaining separate cached copies of your web site for each different browser, and in some cases for different versions of the same browser. This is a huge wast of space, since every browser actually supports "gzip", but just reports it differently. To prevent this confusion, we include this segment in vcl_recv:

# Handle compression correctly. Different browsers send different
# "Accept-Encoding" headers, even though they mostly all support the same
# compression mechanisms. By consolidating these compression headers into
# a consistent format, we can reduce the size of the cache and get more hits.
# @see: http:// varnish.projects.linpro.no/wiki/FAQ/Compression
if (req.http.Accept-Encoding) {
  if (req.http.Accept-Encoding ~ "gzip") {
    # If the browser supports it, we'll use gzip.
    set req.http.Accept-Encoding = "gzip";
  }
  else if (req.http.Accept-Encoding ~ "deflate") {
    # Next, try deflate if it is supported.
    set req.http.Accept-Encoding = "deflate";
  }
  else {
    # Unknown algorithm. Remove it and send unencoded.
    unset req.http.Accept-Encoding;
  }
}

Conclusions

Varnish is an amazing and incredibly efficient tool for serving up common resources from your site to end-users. Besides simply making your site faster, it also can add additional redundancy to your setup by acting as a full backup if the web servers fail. In order to make Varnish both serve as an effective backup and efficient caching layer, it needs to clean up incoming headers from the browser, strip down cookies, and consolidate the "Accept-Encoding" header.

After such extensive explanations it's easy to get overwhelmed, but the good news is that the VCL file provided here can quickly be deployed to almost any Drupal site and start working immediately. For most sites no further customization is needed, and sites that need to tweak it will have a good head start towards huge reductions in server load. On our sites, Varnish is usually able to handle about 85% of the traffic without ever touching the web servers. Even during peak times with hundreds of thousands of requests coming in per hour, Varnish can hum along at less than 5% CPU usage of an average 4-core server. Instead of scaling out your web servers horizontally, adding a few Varnish machines in front of them can save a huge amount of processing and speed up your site at the same time.

If you haven't already, grab the actual default.vcl file that we use on our sites and read it through start to finish. Now with all of the individual pieces explained in-depth above, we hope you can use it as a starting point for your own VCL configuration. Happy caching!

Nate Haug

Senior Drupal Architect

Want Nate Haug to speak at your event? Contact us with the details and we’ll be in touch soon.

Comments

Greg Lund-Chaix

Thanks!

Thanks for posting this, Nate!

I'd puzzled through some of what you're recommending here through trial and error and didn't quite have it working well enough to go into production. You filled in the parts I needed (like the grace stuff and better cookie handling) much more gracefully than my techniques.

This excellent post has also reminded me that I need to update the Varnish config I posted a while back - it has that exact cookie hash misconfiguration you refer to here. I'll definitely update it so I'm not contributing to the misinformation. :-)

Reply

Rudy Grigar

Thanks!

It's great to see this information shared and available. Thank you! I'm also happy to see the configurations I've been using are very close. :)

I'm also using:
-p log_hashstring=off \
in /etc/sysconfig/varnish since it's unused. ( '; // --> /msg00408.html">http: /msg00408.html has some config explanations from John Adams at Twitter).

I've yet to play with Varnish on :443 w/ SSL; is the SSL handled by the load balancer and routed to Varnish unencrypted?

Reply

nate

Varnish with SSL

I've yet to play with Varnish on :443 w/ SSL; is the SSL handled by the load balancer and routed to Varnish unencrypted?

Yep, you got it exactly. Varnish does not handle SSL itself. If it receives a request that is SSL encrypted it will pass it directly to Apache. For us though, we have the load balancer decrypt/encrypt and all traffic between Varnish and Apache is normal HTTP (but we keep using a different port number so we can tell if the page will eventually be secured).

Reply

Lars Peters

That's great

Thank you very much for this excellent and detailed post, Nate. This is going to be very helpful with our to-be-completed Varnish setup.

Reply

moshe weitzman

cookie stripper

Nice article!

When we configure Varnish to unset a cookie, is it then invisible to PHP and javascript code? Or is it only stripped for Varnish's purposes? Presumably these ad vendors and analytics vendors need these cookies.

Reply

eaton

That's what the whitelist is for...

When we configure Varnish to unset a cookie, is it then invisible to PHP and javascript code? Or is it only stripped for Varnish's purposes? Presumably these ad vendors and analytics vendors need these cookies.

Nate is probably the person to answer conclusively, but the purpose of the "whitelist" is to allow those ad and analytics cookies through. Because Varnish is acting as a buffer between outside requests and the actual Apache servers, they never have a chance to see cookies that it's stripped.

I think the assumption in the article is that preserving cache performance is more important than accounting for changes in ad code, at least in the short term. If you're working with a site where that assumption isn't true, using a blacklist of blocked cookies might work better.

Reply

moshe weitzman

It seems to me like Varnish

It seems to me like Varnish could choose to serve the page from cache but still make available to javascript all the current http headers (i.e. the cookies). Stripping could only be done for the purposes of cache calculation.

Reply

Rudy Grigar

Google analytics is a good

Google analytics is a good example here. When the cookie's are stripped, they are stripped from the recv subroutine so that the request to the backend apache daemon is free of cookies that prevent Varnish from caching requests. The cookies still exist until Varnish takes them away from the request to Apache -- this method works for GA.

This allows varnish to cache the requests it sends and receives from the backend, while still handling the cookies. This only works with cookie's that Drupal doesn't need to know about, and is the reason for the whitelist -- if there is a cookie you need to pass to apache/drupal it needs to be in the whitelist. Otherwise the cookie will only prevent Varnish from caching the request.

Reply

nate

Cookies stripped out for PHP, not for JavaScript

The cookies get stripped out before the request to the backend, so PHP no longer has access to them. You might be able to adjust it somehow to use a different header for the check, but unfortunately VCL doesn't have variables you can define to work with (other than access control lists like "internal" one described in this article).

The cookies are definitely still available to JavaScript, since it's client-side technology that is unaffected by Varnish's stripping.

Reply

sdboyer

Varnish version compatibility

Very, very nice post, folks.

A quick note: the attached default vcl will NOT work with Varnish <2.1.0; the beresp (BackEnd RESPonse) object representing the response was introduced in that version. In earlier versions, you referred to the backend's response in sub vcl_fetch simply as obj, and with a number of different properties. There might be other incompatibilities too, but that one in particular has bitten me.

Reply

Rudy Grigar

Hey Sam, The beresp vs. obj

Hey Sam,

The beresp vs. obj change in vcl_fetch is the only big change in the vcl config between 2.0.x and 2.1.x -- there are obviously other changes and additional features, but beresp is the only one that bites.

Saint mode is worth looking at in 2.1.x as well.

If you're stuck on 2.0.x there are other init options you'll want to set in /etc/sysconfig/varnish to keep things running smoothly -- but an upgrade to 2.1.x is even better :)

Reply

nate

Use 2.1.* or higher with this VCL

Yes thanks for the warning sdboyer! You are very correct. I would highly suggest all users upgrade to 2.1.x if you're going to use this VCL file. That's not the only big change, Varnish also switched to using a different regular expressions parser and all patterns are now case sensitive! That'll bite you too and this VCL file won't parse on 2.0.x. In our patterns above, that's what the (?i) provides (case-insensitive).

Reply

Rudy Grigar

Good catch, I forgot about

Good catch, I forgot about the case-sensitive change. That one may bite even more, as your VCL will compile without issue and stop matching on the regex.

FWIW the complete changelog from 2.0->2.1 is here.

Reply

nate

Saint Mode is Something Else

I didn't get into Saint Mode because it didn't apply to us, but I accurately described Grace in the article. We even turned off Apache entirely for 6 hours to make sure it behaved as we expected and kept pumping out pages even with the backends entirely off. Saint Mode is basically Varnish's way of putting a particular server in the corner and saying, "you've been bad, and I'm not going to request that page from you anymore." Instead Varnish will try the same request again on a different server, hoping it will have better luck. Since our health checks should take out unhealthy servers anyway, Saint Mode may just cause the same request to fail multiple times due to a programming error. If all your web servers are identical, the request probably won't succeed no matter how many times you ask your backends.

Reply

Josh Koenig

Thanks Nate

This is a fantastic article. Thanks for putting all this out there and moving the ball forward!

Reply

axolx

Thank you!

Worth the weight of the page in gold :-P Thank you for this awesome contribution to one of the least documented parts of hosting a large scale Drupal site!

Reply

Ryan Hartman

Great post

Thanks for the super detailed post on varnish. This is a great resource, and will prove very useful as I rebuild the infrastructure for my current employer.

Reply

dalin

Thanks for this Nate. About

Thanks for this Nate. About a week ago I was thinking of starting a wiki page in the High Performance g.d.o group for a default .vcl but hadn't gotten around to it yet.

A few thoughts:

I've had issues in the past with stripping cookies from static resources for imagecache/image_resize_filter when the user is logged in. It logged them out. I had to add an exception for those paths. I'll play with your .vcl this week and see if that is still a problem.

Also to make this work with i18n you'll need something like the following in vcl_hash

// Uncomment if different languages are served at the same URL.
//  if( req.http.Accept-Language ) {
//    set req.hash += req.http.Accept-Language;
//  }
Reply

Yannis

html no cache

Thanks a lot for the article!

I try to use your vcl but I cannot get the html page to get served from varnish. In the headers I get "cache-control: max-age=0".

Do I need to set something differently in the .htaccess?

Reply

Michael Dance

Drupal Version?

Is this able to work with Drupal 6, or do I need to use Pressflow/Drupal 7?

Reply

Rich K.

I had to use /16

acl internal {
"172.19.0.0"/16;
"172.21.0.0"/16;
"172.30.0.0"/16;
}

sub vcl_recv {

# also got rid of admin pages except for internal, I don't know if this helps...
...
if ((req.url ~ "^/(cron|install)\.php$" || req.url ~ "^/(admin/)" || req.url ~ "^/admin$") && !client.ip ~ internal) {
# Have Varnish throw the error directly.
...

Reply

Victor

DB Cluster

Hello, really nice article very helpfully.

One only thing... Do you have any other post to explain how to do the DB cluster for Drupal?

Reply

Ben Buckman

Memcache check in status.php

On my setup (Pressflow 6, memcached extension, memcache module), the memcache_connect() function used in status.php doesn't exist. This worked instead with the memcached extension:

if (isset($conf['cache_inc'])) {
   $memcache = new Memcached;
  foreach ($conf['memcache_servers'] as $address => $bin) {
    list($ip, $port) = explode(':', $address);
    if (! $memcache->addServer($host, $port)) {
      $errors[] = 'Memcache bin <em>' . $bin . '</em> at address ' . $address . ' is not available.';
    }
  }
}
Reply

Ben Buckman

Session API

Also on D6/Pressflow, the Session API module's session_api_session cookie (even on anonymous pages) confuses Varnish. I don't entirely understand the cookie-handling logic you describe here... how do you deal with cookies like Session API's?
Thanks! This post was awesome.

Reply

John

Please elaborate and explain

Please elaborate and explain the javascript part.
What is the inclusion list.
I did not get how this part works:

    set req.http.Cookie = ";" req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|NO_CACHE)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

Before using your method, i was also using

set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|__utma_a2a|has_js)=[^;]*", "");
set req.url = regsuball(req.url,"\?gclid=[^&]+$",""); # strips when QS = "?gclid=AAA"
set req.url = regsuball(req.url,"\?gclid=[^&]+&","?"); # strips when QS = "?gclid=AAA&foo=bar"
set req.url = regsuball(req.url,"&gclid=[^&]+",""); # strips when QS = "?foo=bar&gclid=AAA" or QS = "?foo=bar&gclid=AAA&bar=baz"

Now I do not need all these?

Thank you

Reply

Fabianx

That is a tricky, but really

That is a tricky, but really awesome trick :-) used here:

Basically it removes all spaces from cookies, then adds them for allowed cookies, then removes all cookies having no space before the ; and removes any prefixed cookies.

Scenario:

0. a=b; SESS4564645=123; c=d;  NO_CACHE=Y; x=y
1. a=b;SESS4564645=123;c=d;NO_CACHE=Y;x=y
2. a=b; SESS4564645=123;c=d; NO_CACHE=Y;x=y
3. a=b; SESS4564645=123; NO_CACHE=Y
4. SESS4564645=123; NO_CACHE=Y

And thats it :-).

So instead of whitelisting cookies, it is removing all besides SESS* and NO_CACHE cookies.

Best Wishes,

Fabian

Reply

John

A few possible additions to

A few possible additions to your nice code that I m using after finding them at other sites

  if (req.request != "GET" &&
    req.request != "HEAD" &&
    req.request != "PUT" &&
    req.request != "POST" &&
    req.request != "TRACE" &&
    req.request != "OPTIONS" &&
    req.request != "DELETE") {
      /* Non-RFC2616 or CONNECT which is weird. */
      return (pipe);
  }
  if (req.http.Expect) {
# Too hard
return (pipe);
  }
  if (req.request != "GET" &&
req.request != "HEAD") {
# Not cacheable by default
  return(pass);
  }

  if (req.http.Authorization || req.http.Cookie) {
    /* Not cacheable by default */
    return (pass);

  # Add a unique header containing the client address
  remove req.http.X-Forwarded-For;
  set    req.http.X-Forwarded-For = client.ip;

   # remove server signature
     unset beresp.http.Server;
     set beresp.http.Server = "";
  }

Do you disagree with any of these? Do you think they may be useless? Thanks

Reply

nate

Unnecessary. (They're the defaults)

Early in the article I pointed out that the default Varnish settings are always appended to your own set of rules. The rules you've listed here are all defaults and therefor unnecessary to specify manually. See the default.vcl file included with Varnish in the article.

Reply

Fabianx

I find it still useful to add

I find it still useful to add them to the beginning of my rules (Varnish adds them **after** the custom rules), if I use any rules that do a return (pass), because it prevents that I accidentally pass a thing that needs to be piped.

Reply

John

i think that one of the

i think that one of the options that has to do with compression headers normalizing in varnish, may do the site-perf.com tester not to work, if it asks for compressed. any idea why?

Reply

nate

Gzip, Deflate, and None only

The compression headers make it so that Gzip and Deflate are the only supported compression mechanisms. If the site you're referencing checks other compression mechanisms, they'll get back uncompressed content. Considering 99.9% of browsers support GZip and Deflate, there's no reason to serve up a 3rd compression mechanism (apparently for "compressed").

Reply

Ruben

IP management in load balance varnish setup

Firsteable, thanks for a great tutorial. I have improved a lot in the knowleadge of Varnish reading yours tips.

In the other hand, I have some doubts about your setup at net layer. I guess you works with a private IP's mainly with backend apache servers. And maybe with Varnish servers too.

My question is, the public ip address must be in Load Balancers or in Varnish? I'm testing HAproxy and works good but the content is serverd by HAproxy IP public address.

thanks anyway for a great tutorial :)

Reply

nublaii

Varnish hardware

What machine do you guys have running varnish? you mention a 4-core machine and 3G of ram.

We are trying to set up something along these lines and I was curious to see if we're on the right track ;)

Reply

Anonymous

are there any settings for

are there any settings for varnish to serve static files of a specific domain from a subdomain (which points actually to the same server ip) so to increase page load speed through paraller connections?
thanks

Reply

Fabianx

Yes, there is, but serving it

Yes, there is, but serving it from the same server ip does not make sense, because the browser has this problem and Varnish has nothing to do with it, but will use all the requests the backend has available.

But if you have them on a different server, just defining a different backend and doing :

backend static_backend { .host = "192.10.0.2"; }

if (req.req.url ~ "^/files/static/(.*)$") {
  req.backend = static_backend;
}

works.

However this will only speed up your backend processing and that should never be a problem with Varnish.

So what you probably really want is the CDN module and Varnish will transparent also cache the files from the static subdomain.

You can then either make this cookie less via Apache configs or in VCL:

if (req.http.host ~ "static.example.com" ) {
  unset req.http.Cookie;
  return (lookup);
}

And something similar in vcl_deliver for set-cookie (to deal with 404s) ...

Best Wishes,

Fabian

Reply

dalin

Many of the things in the "Do

Many of the things in the "Do not cache these paths." list indicate bugs with those modules - things that should be using POST rather than GET.

Reply

kurt

Excellent article!

Thanks a lot for posting this hands-on guide!
It helped me a lot to get varnish up and running.

One question though. I saw the (very) interesting graphic at the top of this article (load balancers / multiple webservers / DB Clusters).

Any chance you might post an article in the nearby future which covers this setup? Can't wait ;-)

Regards.
kurt

Reply

inchirieri masini

I already have it on my VPS

I already have it on my VPS and i can say it makes a huge difference. I started analyzing page speed after i realized that google takes it into account. I have a bunch of small car rental sites, in Romania, and even if they don't have a lot of visitors, the actual CMS can work a lot on serving them.

Reply

Varinder Singh

Varnish in front of multiple web servers with page cache.

Hello Guys,

We have varnish server running in front of two webservers. These webservers have page cache. Varnish get data from one web server after 5 min, and update static content. But when it tried to get data from other web server and it gets old content and assumed its new data and preserved it for 5 min because page cache is not updated on other web server.We preserved page cache for 10 min.
Do you guys have any solution for this problem?

Thanks

Reply

Arthur

Multiple Varnish Servers

Hi although this article is like two years ago but I still find it immensely helpful! You mentioned that (and in the chart) you are using multiple Varnish servers. How can one sync between them so that server A don't cache whatever server B already got? Reply

nate

Two servers means two copies

We don't actually have any syncing set up between the two servers, we just let each page be requested from the back-end twice before it's fully cached by your Varnish servers. While not great that the page is generated twice, this setup is for redundancy. If your site gets hit a thousand times in a 5 minute period, only generating the page twice still isn't bad.

Additionally, you could choose to use both the Varnish cache and the normal Drupal page cache, in which case the second hit from Varnish would actually be served by Drupal's page cache. This can result in cache overlaps though (essentially doubling the stale content lifetime), so it's up to you if generating the page twice is worse than increasing your cache lifetime.

Reply

Marwan Kandeel

Hi Nate,

Hi Nate,

I'm trying to use the below code in Varnish default.vcl:

Respond to incoming requests.

sub vcl_recv { # ...code from above.

# Remove all cookies that Drupal doesn't need to know about. ANY remaining # cookie will cause the request to pass-through to Apache. For the most part # we always set the NO_CACHE cookie after any POST request, disabling the # Varnish cache temporarily. The session cookie allows all authenticated users # to pass through as long as they're logged in. if (req.http.Cookie) { set req.http.Cookie = ";" + req.http.Cookie; set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";"); set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|NO_CACHE)=", "; \1="); set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", ""); set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

if (req.http.Cookie == "") { # If there are no remaining cookies, remove the cookie header. If there # aren't any cookie headers, Varnish's default behavior will be to cache # the page. unset req.http.Cookie; } else { # If there are any cookies left (a session or NO_CACHE cookie), do not # cache the page. Pass it on to Apache directly. return (pass); }

} }

My problem is that after including this code and restarting the service, I'm not able to login to my opencart admin. Looks like this code is not allowing to pass cookies to Apache. Can anyone help please?

Reply

Amir

SSL Backend

Not sure how are you getting varnish to speak to the SSL backends? What am I missing? I have pound --> varnish --> http backend I want to also have pound --> varnish --> https backend and IDEALLY I want to filter the traffic through VCL Security rules

Reply

Phil

7.2x image tokens

Running into an issue with Varnish and uploading images, led me to the article:

http://drupalblog.torchbox.com/post/45261810748/varnish-and-drupal-7-20-...

I quickly realized that the article was written for Varnixh 2.x and won't work with 3.x, so I came up with the following:

  sub vcl_hash {

       hash_data(req.http.host);

        if (req.url ~ "itok=") {
          set req.http.X-IMG-TMP = regsub(req.url, "\?(.*)itok=.+[^&]", "\1");
        hash_data(req.http.X-IMG-TMP);
        }
    return (hash);
}

To be honest, I don't have a clue what I'm doing... Varnish does successfully start, however some pages are getting cached in such a way that the current page still renders even when clicking on another page link...

Question is: "What is the importance of having Varnish handle the new Drupal 7.2x image tokens? Is there any reason to have Varnish do something with them?"

Thanks!

Reply

alfanet

Help

Dear thank you for the tutorial I did all of this config but I always got error : varnish Error 503 Service Unavailable I changed timeout and the same , I just want to install varnish only , I dnt need apache or Ngnix please if you can help me and if you have working config file send me it thank you

Reply