Want to get Lullabot article, videocast, and podcast announcements delivered right to your in-box?
Let us know your email address (we won't share it) and we'll let you know when anything exciting happens.
Using Lighttpd as a static file server for Drupal
This article discusses Drupal 5.5 and Lighttpd 1.4, with special consideration for the imagecache module 5.x-1.3.
Building websites that can handle high amounts of traffic involves finding points of scalability in the network architecture. There is a lot of discussion about database replication and redundant web servers, but very little discussion has taken place about serving static files from a different server than the one which executes PHP. This article shows how you can configure Drupal to serve static files from a separate server, potentially on a separate machine. There is even a solution for those of you who are using the imagecache module.
Static vs Dynamic content
A webpage in your browser usually consists of HTML plus Javascript, images, CSS, and perhaps some Flash. The typical order of events is that the browser requests the HTML, parses it, and then begins to request the additional .js, .css, .png, .gif, and .flv files. This sequence is well diagrammed on the Yahoo! Developer Network. For Drupal sites, the initial request that returns the HTML is a dynamic request, meaning PHP code and a database are required to generate the HTML. The rest of the requests, however, reference static files. These files require neither PHP nor a database and can be returned to the browser by the simplest and most lightweight web servers available. This is the fundamental difference between a dynamic request (one that requires a script language like PHP) and a static request (one which returns an simple file from the file system).
For a web server like Apache to serve a dynamic Drupal page, it must load extra software (mod_php) in order to be able to execute PHP. This extra software increases the memory footprint of the server and reduces the total number of requests that it can handle before the machine's physical memory is exhausted. Even more memory intensive is the act of executing PHP. A Drupal site with lots of modules installed that handles a lot of data from the database can easily require 64M of memory per thread. This is a huge expenditure of memory compared to the 1-2M it takes to serve a static file. Since Apache recycles its worker threads, you end up in a situation where the same 64M monster that created the Drupal HTML is also used for serving a .jpg file. This is a huge waste of resources.
Adding a static file server to your network thus brings the following advantages:
- Static files are served from a server optimized for the task
- Better utilization of "heavy" PHP server resources
- A new point for scalability; you can add more machines to run static file servers if needed using typical load balancing techniques
Sharing files
Where exactly are the static files in a Drupal site? Here's a list of the typical places:
- files/: Files uploaded by the application
- misc/: Drupal's Javascript files and some images
- modules/: Any module might have extra static files, such as .css, images, .js and so forth
- themes/: Most themes introduce .css and images
- sites/all/: More modules and themes can be found here
With Drupal's static files scattered throughout a directory structure that also contains all of the PHP files needed for Drupal execution, the idea of collecting them separately and putting them on a separate static server is impractical. The solution is to make the entire directory structure available to the static file server and disallow that server from serving requests for the PHP files.
How the files become available to the static file server is another question. One approach is to
It is even possible to run the static file server on the same machine as the dynamic web server and have the two share a document root. This is the approach I take in this article as it demonstrates the principle adequately.
Routing requests
The next issue is how should requests be routed? One approach would be to have a proxy server which routes requests for static files to a separate server. This leaves the application blissfully unaware of the concerns of the static file server. If you have experience with this approach please discuss it in the comments.
A second approach, which I take in this article, is to adjust the application to write the URLs to static resources differently. In Drupal this turns out to be a very simple task because all URLs are generated by a small number of functions. A minor tweak to these functions is sufficient to send all static file requests to the appropriate server.
Here is a survey of the changes that I needed to make to Drupal 5.5 and the Garland theme in order to serve all static files from a separate server. A patch with the complete set of changes is attached below.
Add a variable to $conf in settings.php:
<?php
$conf = array(
'static_url' => 'http://static.example.com/'
);
?>In every function where static files get included in the HTML, update the logic to use the static_url variable. This includes:
- includes/common.inc: drupal_get_css(), drupal_get_js()
- includes/file.inc: file_create_url()
- includes/theme.inc: theme_get_setting(), theme_image()
<?php
// use either the URL to the static server (if set) or the base_path()
$base = variable_get('static_url', base_path());
// Anywhere a resource is being included, use $base
$output .= '<style type="text/css" media="'. $media .'">@import "'. $base . $preprocess_file .'";</style>'. "\n";
?>For the theme, I added a variable to all templates called static_base.
<?php
// in template.php
function _phptemplate_variables($hook, $vars) {
$vars['static_base'] = variable_get('static_url', base_path());
...
?>The static_base variable can then be used where files are directly linked in the theme. For example, in Garland's page.tpl.php:
<?php
<style type="text/css" media="print">@import "< ?php print $static_base . path_to_theme() ? >/print.css";</style>
?>The static file server
I chose to use Lighttpd (aka Lighty) to be the static file server based on its reputation for being lightweight and fast, and because I had never used it before. There are many web servers that can be optimized for the task, however.
I installed Lighttpd on Mac OS X (Leopard) using MacPorts. After the package was installed I made the following changes to the lighttpd.conf file:
## This is the same document root as is used by the Apache server for Drupal
server.document-root = "/Users/robert/public_html/"
## Make sure that directory listings don't work.
index-file.names = ( )
## For the Mac OS X users
server.event-handler = "freebsd-kqueue"
## This plays a similar function to the .htaccess directive that hides certain file extensions.
url.access-deny = ( "~", ".engine", ".inc", ".info", ".install", ".module", ".profile", ".po", ".sh", ".sql", ".theme", ".tpl.php", ".xtmpl" )
## I want Apache to run on 80 so this needs to be something else
server.port = 81I also added this to my .bash_profile so that I could start lighttpd from the command line easily:
PATH=$PATH:/opt/local/sbin
export PATHYou may have to take the additional steps of adjusting your firewall to allow a process to bind to port 81, and some of the directories referenced in the lighttpd.conf file may need to be created.
Once you've finished with the above steps you can test Lighty's configuration with the following command:
sudo lighttpd -t -f /opt/local/etc/lighttpd/lighttpd.conf
You can start the server with this command:
sudo lighttpd -D -f /opt/local/etc/lighttpd/lighttpd.conf
A production instance of lighttpd will require some further configuration, most notably you'll want to use mod_expire and mod_compress to set expiry dates in the future, and to compress textual content for faster transfer over the wire.
Turn off KeepAlive
One of the big gains that can be had by using a static file server is the freedom for your dynamic server to close the connection to the client immediately after serving the initial HTML. In your main web server's configuration you can now turn off the KeepAlive directive. For my setup, using Apache 2 (via MAMP), this involved adding the following line near the top of httpd.conf:
KeepAlive = Off
A restart of Apache is necessary.
Using /etc/hosts
Your static files should always come from a different hostname than your dynamic HTML. This allows the browser to make more efficient use of its connections. On your local machine you can simulate this by editing /etc/hosts:
127.0.0.1 localhost static
This adds a hostname static that also resolves to the local server. Your $conf in settings.php will then look like this:
<?php
$conf = array(
'static_url' => 'http://static:81/'
);
?>Testing it out
With Drupal patched and Lighttpd up and running, you should have a Drupal site that gets its HTML from Apache and its static files from the static file server. Please describe any problems (and their solutions) that you run into in the comments below and I'll update the article accordingly.
Imagecache
The above techniques will work will with any Drupal site that doesn't use imagecache. The imagecache module presents a special challenge because it plays sneaky games with Drupal's 404 error handling. When Drupal receives a request for a resource that isn't on the file system and isn't a valid Drupal path, the Drupal application serves a 404 Not Found page, resulting in a full Drupal bootstrap. Imagecache takes advantage of this and generates image derivatives during this process. This means that imagecache requires requests for static images to come to Drupal - at least in the case when they are 404 Not Found.
To sidestep this problem we want Lighttpd to redirect any 404 requests to the Drupal server. In your lighttpd.conf file, change the following directives so that we can run a small Perl script to do the redirect.
## Uncomment the "mod_cgi" option from server.modules
server.modules = (
"mod_cgi",
...
## Add a 404 handler
## The path is relative to your Drupal installation
server.error-handler-404 = "/scripts/redirect.pl"Now you must add a script to the scripts directory of your Drupal installation and make it executable.
Save this to scripts/redirect.pl
#!/usr/bin/perl
// Here localhost is the hostname for the Drupal server. Update so that your domain or hostname
// is used instead.
print "Location: http://localhost$ENV{REQUEST_URI}\n\n";
exit;Update the URL in the script to use your hostname or domain instead of localhost, if necessary. The file must be executable by the user running the Lighty webserver. Now, when Lighty encounters a 404 request, it will be forwarded to the Drupal web server where imagecache will be able to make the derivative image. After that, Lighty will be able to serve requests for that image.
Please note that imagecache 2.0 is said not to need this workaround.
Conclusion
Setting up a static file server to handle all non-dynamic requests is a moderately simple task that is well worth the while for sites that need to get the best performance and handle the most visitors. It provides a new point of scalability, manages existing server resources better, and can lead to overall faster page loads.





Comments
running drupal with lighttpd?
Interesting article, thanks. But why not just run drupal with lighttpd?
So you needn't hack drupal. It's even easy to put the fastcgi php processes on another server.
No reason not to
I didn't know that lighttpd supported splitting processes over physical machines. Could you describe this more?
Lighty uses php as fastcgi.
Lighty uses php as fastcgi. You can spawn several php processes for it, here are some docs about it:
http://trac.lighttpd.net/trac/wiki/HowToSetupFastCgiIndividualPermission...
So you can let another machine run php, or even more as lighty can do load balancing by defining multiple php workers. This might also interest you:
http://trac.lighttpd.net/trac/wiki/Docs%3APerformanceFastCGI
yes, the docs could be better :/
Yes, very interesting
Some observations:
1. By running a static file server that doesn't run PHP in any way I get a lighter process (no mod_fastcgi).
2. I don't see any advantage in having the fastcgi processes run on distributed back ends. If I were going to have many physical machines running PHP processes I'd make them first class web servers.
3. One of the key points of my article is that you can have your PHP web server turn off KeepAlive. If the same Lighttpd process is responsible for the PHP request and all js, css and gif requests, this becomes impossible.
1. of course, each module
1. of course, each module bloats the process a bit. However, there is only lighttpd process and mod_fastcgi is nothing big, as php is spawned as separate process (fastcgi!). Furthermore you can save apache...
2. As you prefer :) But so you could also separate static files to a dedicated machine.
3. I just tried to conditionally turn it of with lighty - works. However in this case one shouldn't turn it off I think.
#2 - intended for separate machine
In the end I only set it up on the same machine for the convenience of testing, but the whole idea here is that the static file server can be a separate machine (or more than one if needed). I'll add a note to the article.
test results
I'm curious, have you tried load testing this set-up vs. a normal setup? If so, what kind of improvements did you see?
Have not yet benchmarked
One of my motivations for getting the instructions published is to encourage people to help me benchmark and refine them. I work hard but the days never have enough hours for me to get everything done =)
What a timely article...
I almost certainly going to be putting this information to use very soon. Thank a gazillion for the post. :-)
Good one!
As someone who has to work in narrow bandwidth markets I tend to worry a lot about things like response times and http requests. Will set this up and do some internal testing to see if I get any gains. thanks...
Why not use 2 instances of Apache?
Why don't you just start another instance of Apache with a httpd.conf that is optimized for static content? (ie. no php cgi etc) That technique is mostly used to run PHP5 and PHP4 concurrently but will theoretically work just as good for your purpose.
I have used that setup (proxy) for the dual PHP purpose but I figure it will work just as good for Apache. I must however add I have not done this on Apache 2, only on the 1.x versions.
2 Apaches just as good
I'm not trying to suggest that Lighttpd is the only server for the purpose. The two main ideas of the article are 1) that static file serving is desirable, and 2) that you can do it at the application level by generating different URLs. The server configuration is only a side topic here. The outstanding question of the article is whether using a proxy server to do URL rewriting is a better solution?
Simpler approach
I use a much simpler approach.
(1) Set up a fast http server on a different port or different server (I wrote my own).
(2) Redirect requests to static files (like files in the /files folder or images) via mod_rewrite to this server.
But anyway, think it's a very useful idea to have different servers here!
cdn hosting of static content
nice article robert. have you considered the applicability of your approach for hosting static drupal content on a cdn (akamai, mirror-image etc.). strikes me that your solution provides a good recipe for that, and that for content heavy sites particularly, this would be a good way to improve both performance and scalability.
Memcache could do this just as easily...
If you already have memcache running, why wouldn't you simply use that to store these static files in ram? Fallover is already taken care of with memcache and you can scale as well as you do with litty... but with litty you have to configure a lot more "tricks" to get things to work and it becomes a point of failure.
What about private files?
Thanks for the great article!
I going of topic but the feature I would like to see in Drupal is one of my most wanted ones!
Of course this approach is not suitable in case you have setup the private files option.
It is just such a great pity that this is all or nothing decission. I would be really great if you could configure the private files option per content type.
Then you could serve the static files where there is no security required with the method described here.
[PATCH] Static file serving, locally or via CDN
Please, take a look at my patch for 5.5 that combines Robert's code with CDN support (using CDN Integration module) and also JavaScript aggregation:
http://drupal.org/node/149402#comment-699065
The idea is to funnel all static file requests to a new
static_filefunction which resolves either to a static base or to a CDN request. However, if you want to use it without JS aggregation you need to review / change the code.I filed an issue to get this into D6 core. Would anyone want to help get it in there?
http://drupal.org/node/212369
Less painful approach
Here is another approach that would be easier to setup (external to Drupal), and would work in many cases (but not all).
If you use Squid to front end Apache, Squid will cache the static files for you. They now get served from Squid's cache and Apache only does the PHP side of things.
So, if Apache is on a different machine than Squid, you effectively do what you did with lighty.
This would work as long as you have one internet facing server (Squid) frontending one or more Apache.
Oh, and consider nginx over lighttpd. The latter has been suffering from memory leaks that nginx does not suffer from.
thanx
I hope this article will help us improve performance of ourfree press release website. We are using a VPS from Linode with 720 MB memory for hosting this site. More visitors to this site is resulting in more apache child process. Each process takes more than 35 MB of physical memory almost exhausting all the physical memory. I tried to limit the number of apache process to 10MB only to find out that image and other static resource loads slow. So what I want is two set of apache processes, one that can consume more physical memory and handle request for PHP pages. Then more number of apache processes which will take not more than 4-5MB to serve static resources like images, css, java script etc. I like this solution but I am bit wary of patching the core modules and common drupal files. This might result in problem and more work when I upgrade to Drupal 6.0. Another solution I want to try is running two apache servers, one for static and one for dynamic. Anyone trying out similar experiment please share your experience.
Don't work
I test your solution in my site and don't work
The browser don't show the static content (http://static:81/...)
I edited the HOST file, but don't work
Rainer wrote: (2) Redirect
Rainer wrote:
(2) Redirect requests to static files (like files in the /files folder or images) via mod_rewrite to this server.
Any chance you could share the .htaccess snippet that achieved this?
Post new comment