by Robert Douglass on January 2, 2008 // Short URL

Using Lighttpd as a static file server for Drupal

This article discusses Drupal 5.5 and Lighttpd 1.4, with special consideration for the imagecache module 5.x-1.3.

Building websites that can handle high amounts of traffic involves finding points of scalability in the network architecture. There is a lot of discussion about database replication and redundant web servers, but very little discussion has taken place about serving static files from a different server than the one which executes PHP. This article shows how you can configure Drupal to serve static files from a separate server, potentially on a separate machine. There is even a solution for those of you who are using the imagecache module.

Static vs Dynamic content

A webpage in your browser usually consists of HTML plus Javascript, images, CSS, and perhaps some Flash. The typical order of events is that the browser requests the HTML, parses it, and then begins to request the additional .js, .css, .png, .gif, and .flv files. This sequence is well diagrammed on the Yahoo! Developer Network. For Drupal sites, the initial request that returns the HTML is a dynamic request, meaning PHP code and a database are required to generate the HTML. The rest of the requests, however, reference static files. These files require neither PHP nor a database and can be returned to the browser by the simplest and most lightweight web servers available. This is the fundamental difference between a dynamic request (one that requires a script language like PHP) and a static request (one which returns an simple file from the file system).

For a web server like Apache to serve a dynamic Drupal page, it must load extra software (mod_php) in order to be able to execute PHP. This extra software increases the memory footprint of the server and reduces the total number of requests that it can handle before the machine's physical memory is exhausted. Even more memory intensive is the act of executing PHP. A Drupal site with lots of modules installed that handles a lot of data from the database can easily require 64M of memory per thread. This is a huge expenditure of memory compared to the 1-2M it takes to serve a static file. Since Apache recycles its worker threads, you end up in a situation where the same 64M monster that created the Drupal HTML is also used for serving a .jpg file. This is a huge waste of resources.

Adding a static file server to your network thus brings the following advantages:

  • Static files are served from a server optimized for the task
  • Better utilization of "heavy" PHP server resources
  • A new point for scalability; you can add more machines to run static file servers if needed using typical load balancing techniques

Sharing files

Where exactly are the static files in a Drupal site? Here's a list of the typical places:

  • files/: Files uploaded by the application
  • misc/: Drupal's Javascript files and some images
  • modules/: Any module might have extra static files, such as .css, images, .js and so forth
  • themes/: Most themes introduce .css and images
  • sites/all/: More modules and themes can be found here

With Drupal's static files scattered throughout a directory structure that also contains all of the PHP files needed for Drupal execution, the idea of collecting them separately and putting them on a separate static server is impractical. The solution is to make the entire directory structure available to the static file server and disallow that server from serving requests for the PHP files.

How the files become available to the static file server is another question. One approach is to
host the files on an NFS server which all web servers and the static file system mount. Another approach is to use rsync to keep redundant copies of the entire directory structure available to every server. There are other options as well.

It is even possible to run the static file server on the same machine as the dynamic web server and have the two share a document root. This is the approach I take in this article as it demonstrates the principle adequately.

Routing requests

The next issue is how should requests be routed? One approach would be to have a proxy server which routes requests for static files to a separate server. This leaves the application blissfully unaware of the concerns of the static file server. If you have experience with this approach please discuss it in the comments.

A second approach, which I take in this article, is to adjust the application to write the URLs to static resources differently. In Drupal this turns out to be a very simple task because all URLs are generated by a small number of functions. A minor tweak to these functions is sufficient to send all static file requests to the appropriate server.

Here is a survey of the changes that I needed to make to Drupal 5.5 and the Garland theme in order to serve all static files from a separate server. A patch with the complete set of changes is attached below.

Add a variable to $conf in settings.php:

<?php
$conf
= array(
 
'static_url' => 'http://static.example.com/'
);
?>

In every function where static files get included in the HTML, update the logic to use the static_url variable. This includes:

  • includes/common.inc: drupal_get_css(), drupal_get_js()
  • includes/file.inc: file_create_url()
  • includes/theme.inc: theme_get_setting(), theme_image()

<?php
// use either the URL to the static server (if set) or the base_path()
$base = variable_get('static_url', base_path());

// Anywhere a resource is being included, use $base
$output .= '<style type="text/css" media="'. $media .'">@import "'. $base . $preprocess_file .'";</style>'. "\n";
?>

For the theme, I added a variable to all templates called static_base.

<?php
// in template.php
function _phptemplate_variables($hook, $vars) {
 
$vars['static_base'] = variable_get('static_url', base_path());
...
?>

The static_base variable can then be used where files are directly linked in the theme. For example, in Garland's page.tpl.php:

<?php
<style type="text/css" media="print">@import "< ?php print $static_base . path_to_theme() ? >/print.css";</style>
?>

The static file server

I chose to use Lighttpd (aka Lighty) to be the static file server based on its reputation for being lightweight and fast, and because I had never used it before. There are many web servers that can be optimized for the task, however.

I installed Lighttpd on Mac OS X (Leopard) using MacPorts. After the package was installed I made the following changes to the lighttpd.conf file:

## This is the same document root as is used by the Apache server for Drupal
server.document-root        = "/Users/robert/public_html/"

## Make sure that directory listings don't work.
index-file.names            = ( )

## For the Mac OS X users
server.event-handler = "freebsd-kqueue"

## This plays a similar function to the .htaccess directive that hides certain file extensions.
url.access-deny             = ( "~", ".engine", ".inc", ".info", ".install", ".module", ".profile", ".po", ".sh", ".sql", ".theme", ".tpl.php", ".xtmpl" )

## I want Apache to run on 80 so this needs to be something else
server.port                = 81

I also added this to my .bash_profile so that I could start lighttpd from the command line easily:

PATH=$PATH:/opt/local/sbin
export PATH

You may have to take the additional steps of adjusting your firewall to allow a process to bind to port 81, and some of the directories referenced in the lighttpd.conf file may need to be created.

Once you've finished with the above steps you can test Lighty's configuration with the following command:
sudo lighttpd -t -f /opt/local/etc/lighttpd/lighttpd.conf
You can start the server with this command:
sudo lighttpd -D -f /opt/local/etc/lighttpd/lighttpd.conf

A production instance of lighttpd will require some further configuration, most notably you'll want to use mod_expire and mod_compress to set expiry dates in the future, and to compress textual content for faster transfer over the wire.

Turn off KeepAlive

One of the big gains that can be had by using a static file server is the freedom for your dynamic server to close the connection to the client immediately after serving the initial HTML. In your main web server's configuration you can now turn off the KeepAlive directive. For my setup, using Apache 2 (via MAMP), this involved adding the following line near the top of httpd.conf:

KeepAlive = Off

A restart of Apache is necessary.

Using /etc/hosts

Your static files should always come from a different hostname than your dynamic HTML. This allows the browser to make more efficient use of its connections. On your local machine you can simulate this by editing /etc/hosts:

127.0.0.1       localhost static

This adds a hostname static that also resolves to the local server. Your $conf in settings.php will then look like this:

<?php
$conf
= array(
 
'static_url' => 'http://static:81/'
);
?>

Testing it out

With Drupal patched and Lighttpd up and running, you should have a Drupal site that gets its HTML from Apache and its static files from the static file server. Please describe any problems (and their solutions) that you run into in the comments below and I'll update the article accordingly.

Imagecache

The above techniques will work will with any Drupal site that doesn't use imagecache. The imagecache module presents a special challenge because it plays sneaky games with Drupal's 404 error handling. When Drupal receives a request for a resource that isn't on the file system and isn't a valid Drupal path, the Drupal application serves a 404 Not Found page, resulting in a full Drupal bootstrap. Imagecache takes advantage of this and generates image derivatives during this process. This means that imagecache requires requests for static images to come to Drupal - at least in the case when they are 404 Not Found.

To sidestep this problem we want Lighttpd to redirect any 404 requests to the Drupal server. In your lighttpd.conf file, change the following directives so that we can run a small Perl script to do the redirect.

## Uncomment the "mod_cgi" option from server.modules
server.modules              = (
                               "mod_cgi",
...

## Add a 404 handler
## The path is relative to your Drupal installation
server.error-handler-404   = "/scripts/redirect.pl"

Now you must add a script to the scripts directory of your Drupal installation and make it executable.

Save this to scripts/redirect.pl

#!/usr/bin/perl

// Here localhost is the hostname for the Drupal server. Update so that your domain or hostname
// is used instead.
print "Location: http://localhost$ENV{REQUEST_URI}\n\n";
exit;

Update the URL in the script to use your hostname or domain instead of localhost, if necessary. The file must be executable by the user running the Lighty webserver. Now, when Lighty encounters a 404 request, it will be forwarded to the Drupal web server where imagecache will be able to make the derivative image. After that, Lighty will be able to serve requests for that image.

Please note that imagecache 2.0 is said not to need this workaround.

Conclusion

Setting up a static file server to handle all non-dynamic requests is a moderately simple task that is well worth the while for sites that need to get the best performance and handle the most visitors. It provides a new point of scalability, manages existing server resources better, and can lead to overall faster page loads.

Robert Douglass

Comments

fago

running drupal with lighttpd?

Interesting article, thanks. But why not just run drupal with lighttpd?

So you needn't hack drupal. It's even easy to put the fastcgi php processes on another server.

Reply

robert

No reason not to

I didn't know that lighttpd supported splitting processes over physical machines. Could you describe this more?

Reply

robert

Yes, very interesting

Some observations:

1. By running a static file server that doesn't run PHP in any way I get a lighter process (no mod_fastcgi).
2. I don't see any advantage in having the fastcgi processes run on distributed back ends. If I were going to have many physical machines running PHP processes I'd make them first class web servers.
3. One of the key points of my article is that you can have your PHP web server turn off KeepAlive. If the same Lighttpd process is responsible for the PHP request and all js, css and gif requests, this becomes impossible.

Reply

fago

1. of course, each module

1. of course, each module bloats the process a bit. However, there is only lighttpd process and mod_fastcgi is nothing big, as php is spawned as separate process (fastcgi!). Furthermore you can save apache...

2. As you prefer :) But so you could also separate static files to a dedicated machine.

3. I just tried to conditionally turn it of with lighty - works. However in this case one shouldn't turn it off I think.

Reply

robert

#2 - intended for separate machine

In the end I only set it up on the same machine for the convenience of testing, but the whole idea here is that the static file server can be a separate machine (or more than one if needed). I'll add a note to the article.

Reply

Anonymous

test results

I'm curious, have you tried load testing this set-up vs. a normal setup? If so, what kind of improvements did you see?

Reply

robert

Have not yet benchmarked

One of my motivations for getting the instructions published is to encourage people to help me benchmark and refine them. I work hard but the days never have enough hours for me to get everything done =)

Reply

ricoflan

Good one!

As someone who has to work in narrow bandwidth markets I tend to worry a lot about things like response times and http requests. Will set this up and do some internal testing to see if I get any gains. thanks...

Reply

Matthijs

Why not use 2 instances of Apache?

Why don't you just start another instance of Apache with a httpd.conf that is optimized for static content? (ie. no php cgi etc) That technique is mostly used to run PHP5 and PHP4 concurrently but will theoretically work just as good for your purpose.

I have used that setup (proxy) for the dual PHP purpose but I figure it will work just as good for Apache. I must however add I have not done this on Apache 2, only on the 1.x versions.

Reply

robert

2 Apaches just as good

I'm not trying to suggest that Lighttpd is the only server for the purpose. The two main ideas of the article are 1) that static file serving is desirable, and 2) that you can do it at the application level by generating different URLs. The server configuration is only a side topic here. The outstanding question of the article is whether using a proxy server to do URL rewriting is a better solution?

Reply

Rainer

Simpler approach

I use a much simpler approach.
(1) Set up a fast http server on a different port or different server (I wrote my own).
(2) Redirect requests to static files (like files in the /files folder or images) via mod_rewrite to this server.

But anyway, think it's a very useful idea to have different servers here!

Reply

John Quinn

cdn hosting of static content

nice article robert. have you considered the applicability of your approach for hosting static drupal content on a cdn (akamai, mirror-image etc.). strikes me that your solution provides a good recipe for that, and that for content heavy sites particularly, this would be a good way to improve both performance and scalability.

Reply

John Lewis

Memcache could do this just as easily...

If you already have memcache running, why wouldn't you simply use that to store these static files in ram? Fallover is already taken care of with memcache and you can scale as well as you do with litty... but with litty you have to configure a lot more "tricks" to get things to work and it becomes a point of failure.

Reply

JoepH

What about private files?

Thanks for the great article!

I going of topic but the feature I would like to see in Drupal is one of my most wanted ones!

Of course this approach is not suitable in case you have setup the private files option.
It is just such a great pity that this is all or nothing decission. I would be really great if you could configure the private files option per content type.
Then you could serve the static files where there is no security required with the method described here.

Reply

Anonymous

Truely agree - Private files in Drupal, a need of the day!

Off topic too!

For a private server the needs are different. The static file become applicable for the images and css and js files associated with HTML but for the main application we need to use Private files. Private files makes the file handling process itself dynamic, where in we will require to use PHP to identify permissions and the location of the files. This is handled by a separate menu call back in Drupal. We can not use static files for this purpose.

We are looking at three approaches:

1. To handle files to be placed in any physical directory (not in the root)within the same server

2. To handle files to be placed in any physical directory (not in the root) in a different server - here we are using the FTP protocol to access and place files in the file server. This is a custom code written by us.

3. To handle files to be placed in any physical directory (not in the root) in a different server - using Curl

Questions that arise - Which of these are scalable? Which is most secure?

Reply

dkruglyak

[PATCH] Static file serving, locally or via CDN

Please, take a look at my patch for 5.5 that combines Robert's code with CDN support (using CDN Integration module) and also JavaScript aggregation:
http://drupal.org/node/149402#comment-699065

The idea is to funnel all static file requests to a new static_file function which resolves either to a static base or to a CDN request. However, if you want to use it without JS aggregation you need to review / change the code.

I filed an issue to get this into D6 core. Would anyone want to help get it in there?
http://drupal.org/node/212369

Reply

Khalid -- 2bits.com

Less painful approach

Here is another approach that would be easier to setup (external to Drupal), and would work in many cases (but not all).

If you use Squid to front end Apache, Squid will cache the static files for you. They now get served from Squid's cache and Apache only does the PHP side of things.

So, if Apache is on a different machine than Squid, you effectively do what you did with lighty.

This would work as long as you have one internet facing server (Squid) frontending one or more Apache.

Oh, and consider nginx over lighttpd. The latter has been suffering from memory leaks that nginx does not suffer from.

Reply

Webmaster

thanx

I hope this article will help us improve performance of ourfree press release website. We are using a VPS from Linode with 720 MB memory for hosting this site. More visitors to this site is resulting in more apache child process. Each process takes more than 35 MB of physical memory almost exhausting all the physical memory. I tried to limit the number of apache process to 10MB only to find out that image and other static resource loads slow. So what I want is two set of apache processes, one that can consume more physical memory and handle request for PHP pages. Then more number of apache processes which will take not more than 4-5MB to serve static resources like images, css, java script etc. I like this solution but I am bit wary of patching the core modules and common drupal files. This might result in problem and more work when I upgrade to Drupal 6.0. Another solution I want to try is running two apache servers, one for static and one for dynamic. Anyone trying out similar experiment please share your experience.

Reply

Anonymous

Rainer wrote: (2) Redirect

Rainer wrote:

(2) Redirect requests to static files (like files in the /files folder or images) via mod_rewrite to this server.

Any chance you could share the .htaccess snippet that achieved this?

Reply

Evan Donovan

Also having issues w/port 81 being blocked

As GUstavo also reported, we are having issues w/many of our website's visitors reporting that they can't see our pages properly now that we have Lighttpd serving static content through port 81. Is there a workaround?

Reply

Web Dude

KeepAlive ON!

KeepAlive is a mechanism that keeps a child process that has served a request from immediately closing the connection to that request's sender client. Instead, the connection will be "kept alive" for a period of KeepAliveTimeOut seconds, after which the connection will be closed. This is very interesting because opening a connection is a costly mechanism, and because it is very common that a client sends a group of requests in a short period of time.

For example, when you download a web page, you send a request for the page itself, and one request for each file referenced on this page (images, css, javascript, etc.). Without KeepAlive, you would open a new connection to apache/lighttpd for everyone of these requests, and they would all be served by multiple child processes. With KeepAlive on, all requests made when you connect to a page are served over the same TCP connection by the same child process, which allows significant speed improvements.

Flickr, Wikipedia, Google, etc use KeepAlive to serve their static files. Don't turn it off.

Reply

Anonymous

you're missing the point

keepalive is a tradeoff
the connection is kept open, which
a) allows the same process to serve multiple requests from the same client
b) blocks the process from talking to any other clients for a set amount of time

so
if you move static content serving to lighty
you want to turn keepalive off to avoid httpd processes just sitting around after serving one request ;)

Reply

chirale

I'm making some tests using

I'm making some tests using nginx on port 82, using this rewrite rule:

# if file exists

RewriteCond %{REQUEST_FILENAME} -f

# and its'not a php, serve content via nginx (on port 82):

RewriteRule !(.*?)\.(php) http\:\/\/%{HTTP_HOST}\:82%{REQUEST_URI} [L]

With this simple rewrite rule, even imagecache works. The "-f" do the magic, because imagecache generate derivative image on a 404 error (a php script on drupal). After the image generation, RewriteCond is fulfilled and the image is served through port 82.

Reply

j0rd

Mod_Rewrite to pass requests to Lighttpd

I use this little mod_rewrite gem on a non-drupal install to force all "static" file extensions to lighttpd.

Enable mod_rewrite and mod_proxy and put this into your apache.conf

RewriteEngine On
RewriteRule "^/(.*)\.(jpg|jpeg|gif|png|swf|pdf|avi|mpeg|mpg|mp3)$" "http://127.0.0.1:81/$1.$2" [P]
ProxyPassReverse / http://127.0.0.1:81/

What this does is redirect all .jpg .jpeg .gif .png ... file extensions to your server which hosts static files. In my example that host is http://127.0.0.1:81 . This is perhaps not the most optimal solution, but i've tested it in a production environments and it seamlessly allows apache to server 30%+ more requests with about an hours work.

Another bonus of this is that all requests not ending in these will not hit your lighttpd which may be incorrectly configured for security (ie. serving .php files as text) as I think this examples configuration could if your static file host is accessible via public URL and shares your Drupal directory.

url.access-deny             = ( "~", ".engine", ".inc", ".info", ".install", ".module", ".profile", ".po", ".sh", ".sql", ".theme", ".tpl.php", ".xtmpl" )

^ notice it doesn't not include .php which could get you in trouble if your static file server is accessible by the public and shares the same path as your Drupal. Think settings.php . Reply

j0rd

I resolved the issue with Apache Proxy / Lighttpd / Imagecache

apache.conf

...
RewriteEngine On

# Checks to see if the filepath requested exists, if it doesn't apache will serve the request
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f

# If the file exists and has a file extension we want to pass to lighttpd...
# do so here and skip all other rule sets and pass to proxy
RewriteRule "^/(.*)\.(css|js|ico|jpg|jpeg|gif|png|swf|pdf|avi|mpeg|mpg|mp3)$" "http://127.0.0.1:6666/$1.$2" [NC,P,L]

# Replace domain.tld with your site name. This is for a "multi-site install"
# If the file exists, but has an imagecache path, we need to rewrite it for lighttpd.
# Do some perl matches for info we need from the path
RewriteCond %{REQUEST_URI} ^/sites/domain.tld/files/imagecache/(.+)/sites/domain.tld/files/(.*\.(jpg|jpeg|gif|png|swf|pdf|avi|mpeg|mpg|mp3))$

# %1 and %2 will come from our above matching. Nifty!
# We pass the real file path and have lighttpd serve it from the filesystem
RewriteRule "^(.+)$" "http://127.0.0.1:6666/sites/domain.tld/files/imagecache/%1/%2" [NC,P,L]

ProxyPassReverse / http://127.0.0.1:6666/
...

TADA! Now your lighttpd just needs to serve the same path as your apache...but i have some more tweaks. If you're interested....read on.

lighttpd.conf

....
# mod_expire must go before mod_compress if you use these optional modules
server.modules = (
"mod_access", # included by default
"mod_alias", # included by default
"mod_accesslog", # included by default

# Optional Addtional Speedups (explained below)
"mod_expire", # not required, but an extra speedup for lighttpd
"mod_compress", # not reqiured, but an extra speedup for lighttpd
)

# Make sure this is the same directory that your apache is serving
server.document-root = "/home/domain.tld/htdocs"

# This is less required with my setup, since I only forward certain file extensions to lighttpd and it's not available on a public IP
# If you server is on a public IP, make sure you set this the same as your apache and include extra extensions you do not want lighttpd to serve as text like PHP.
# Lighttpd will send them as text and this is a huge security vulnerability
url.access-deny = ( "~", ".htaccess", ".htpasswd", ".php", ".cgi", ".pl", ".engine", ".inc", ".info", ".install", ".module", ".profile", ".po", ".sh", ".sql", ".theme", ".tpl.php", ".xtmpl", "CVS", ".svn" )

# My lighttpd is setup on port 6666
server.port = 6666

# For security purposes I strongly advise to host this on a non-public IP like 127.0.0.1
server.bind = "127.0.0.1"

# Optional other speed ups like compression for CSS/JS and Cache-Expiry headers

#### compress module
# If you want to enable compression of your javascript and css that lighttpd serves...
# Enable the compression module and set this up.

compress.cache-dir = "/var/tmp/lighttpd/cache/compress/"
compress.filetype = ("text/plain", "text/html", "text/css", "application/x-javascript")

#### external configuration files needed for compression module
# In order for lighttpd to translate mime-types there a script you can download from google.
# You only need this if you enable compression

## mimetype mapping
include_shell "/usr/share/lighttpd/create-mime.assign.pl"

#### expire module
# Another optional speed up which will set cache for your files lighttpd serves.
# YSlow says set this for like a year, I figured 180 was enough

$HTTP["url"] =~ "^/" {
expire.url = ( "/" => "access plus 180 days")
setenv.add-response-header = ( "Cache-Control" => "public, max-age=15552000" )
}

...

As you can see from the lighttpd.conf file I've added mod_compress to compress the CSS and JS I send with lighttpd and added expiry headers to all files served to promote caching of files served under lighttpd.

Imageache / Apache Proxy and Lighttpd debunked.

Using imagecache 2.0 btw.

Reply

Manju Sheshadri

Squid is doing the same right?

I think squid is doing the same without any code changes. Do anyone see any difference between squid and this approach?

Reply