Scaling Apache for LAMP

Apache Feather

Buddhika Siddhisena

(Co-Founder & CTO of THINKCube Systems)

bud@thinkcube.com | twitter @geekaholic

Presenter Notes

What is Apache

The folk story is that Apache was named after "A-patchy-server", which was the result of NCSA httpd server being patched a lot. The project was started by Brian Behlendorf

Alt Brian and I

Today, Apache is still the most popular web server out there running more than half of the websites on the net. It is actively developed by the Apache Software Foundation along with many other software projects.

Besides its primary function of being a website, Apache can also be configured as a reverse proxy for load balancing.

Presenter Notes

Installing Apache

We assume your working on Ubuntu. Translate to your favorite distro accordingly.

The easiest method of installing Apache along with PHP and MySQL (aka LAMP) is to use the tasksel command.

tasksel

Alternatively install each package manually:

apt-get install apache2 libapache2-mod-php5 mysql-server

Installing a sample LAMP app - Drupal

In order to test out Apache performance as we tune it, it is good to setup a real world full fledged CMS such as Drupal.

  • Download the latest version of Drupal from drupal.org
  • Follow the Drupl setup guide
  • Install the Devel module into Drupal modules directory
  • Login to Drupal as admin and using the devel plugin, populate Drupal with sample data for testing
    (Configuration -> Development -> Generate Content)

Presenter Notes

Setting up Benchmarking tools

Setup Autobench

Autobench is a handy script to stress test a webserver by sending an increasing number of requests. It works by calling the httperf tool iteratively with increasing parameters.

Download autobench and follow directions to compile.

In order to plot graphs, you need to install gnuplot via apt. As of this writing, the script used to plot the graph has a bug calling the current version of gnuplot and requires the following minor modification.

$ sudo vi which bench2graph

line ~78 should be

echo set style data linespoints >> gnuplot.cmd

Presenter Notes

Baseline benchmark with Autobench

Lets benchmark our standard Apache setup to get an idea of default performance.

autobench --single_host --host1 localhost --uri1 /drupal --quiet     \
       --low_rate 20 --high_rate 200 --rate_step 20 --num_call 10 \
       --num_conn 5000 --timeout 5 --file results.tsv

Basically the above will test a single host, localhost/drupal by sending it 20 connections per second, each having 10 requests up to 200 connections per second incrementing by 20. The total number of connections are capped at 5000 while any request that takes more than 5 seconds to respond is considered unsuccessful.

Plotting the results

Using the result.tsv file and the included bench2graph utility, you can plot a graph into a postscript file.

bench2graph results.tsv results.ps

Sample graph image

Presenter Notes

Tuning Apache - Enable GZip

You can decrease network overhead and make pages load faster, there by reducing the amount of time a client is connected by compressing pages using gzip. All modern browser support rendering compressed files.

In order to benchmark its effect, you can install a tool such as Firebug on the client side.

Firebug screenshot

Presenter Notes

Tuning Apache - Enable GZip

Enable the mod_deflate module. On Ubuntu :

a2enmod deflate && a2enmod headers

Then we'll configure deflate to compress everything except images.

sudo vi /etc/apache2/modules-enabled/deflate.conf

 1 <Location />  
 2     # Insert filter 
 3     SetOutputFilter DEFLATE
 4 
 5     # Don't compress images  
 6     SetEnvIfNoCase Request_URI  .(?:gif|jpe?g|png)$ no-gzip dont-vary
 7 
 8     # Make sure proxies don't deliver the wrong content
 9     Header append Vary User-Agent env=!dont-vary
10 
11 </Location>

Presenter Notes

Apache configuration tuning

There are a few key parameters that can be tuned:

  • KeepAlive - By default its set to ON which is good. Clients will make all requests in one shot via http 1.1.
  • KeepAliveTimeout - Better to keep it low. Defaults to 15 sec. Make sure thats enough. Rule is 1.5 to 2 times your page load speed.
  • TimeOut - The default is 5 minutes which might be long to allow for one process. Adjust accordingly.
  • StartServers, MinSpareServers, MaxSpareServers - Generally even on a busy site you may not need to tweak. Apache can self regulate.
  • MaxClients - The maximum number of clients (threads) Apache will handle simultaneously.

Presenter Notes

Calculating MaxClients

ps -eafly |grep apache2|awk '{print $8}'|sort -n

Use free to figure out how much memory is available. Cache is also considered free memory but you might want to leave some and not assume all cache will be used.

free

By deviding free memory by the average memory used by an Apache thread, you can estimate the number of MaxClients.

e.g: Assuming Apache memory usage and free memory are as follows

$ ps -eafly |grep apache2|awk '{print $8}'|sort -n

816
3896
3896
3896
3896
20844

$ free
             total       used       free     shared    buffers     cached
Mem:        508904     447344      61560          0     141136     213468
-/+ buffers/cache:      92740     416164
Swap:       407544       4364     403180

Memory avail ~= 60000 (free) + 100000 (cached) ~= 160 MB and Memory per thread ~= 4 MB Then a safe value for MaxClients = 40

Presenter Notes

Improving PHP performance

We can improve PHP performance by

  1. Caching pages (useful if dynamic content doesn't change often)
  2. PHP Opcode optimizations (pre-compile php)

Fortunately we can get the benefit of both using PHP APC, which is a PHP accellerator!

apt-get install php-apc

You can verify installation by loading a php page having phpinfo(); and searching for apc. Or if you have php5-cli installed:

php -r "phpinfo();" | grep apc

Using memcached

Memcached is a distributed cache for storing key-value pairs in memory for faster access with reduced trips to the database. Some popular PHP apps can use memcache if available. memcached does not instantly accellerate PHP!

apt-get install memcached php5-memcache

service memcached start

Presenter Notes

More Tips for improving performance

  • Keep DirectoryIndex file list as short as possible.
  • Whenever possible disable .htaccess via AllowOverride none
  • Use Options FollowSymLinks to simplify file access process in Apache
  • Minimize the use of mod_rewrite or at least complex regexs
  • If logs are unnecessary disable them or log to another server via syslog.
  • For Deny/Allow rules use IPs rather then domains. (prevents superfluous DNS lookups).
  • Do not enable HostnameLookups (DNS is slow).
  • For dynamic sites see if you can separate dynamic vs static content into two servers

Presenter Notes

Simple Scalable Architectures

Apache Feather

Buddhika Siddhisena

(Co-Founder & CTO of THINKCube Systems)

bud@thinkcube.com | twitter @geekaholic

Presenter Notes

Architecture overview

In terms of scaling the web server there are few options.

1. Single machine (Scale vertically)

Basically the easiest to setup. Scaling is a matter of buying a better server or upgrading it!

1-tier architecture

2. App-DB machines (2-Tier)

Separate DB from App, as a result each can be scaled separately.

2-tier architecture

Presenter Notes

Architecture overview contd...

3. Load balancer + App-DB machines (3-Tier)

Load balancer (aka reverse proxy) will route requests betwen multiple backend HTTP servers while caching results.

3-tier architecture

Presenter Notes

Data Independence

Data scalability is beyond the scope of this presentation.

It is good to isolate the data from the app by hosting it on a separate server. This was the two aspects can be scaled independantly. Some methods to consider:

  • Store DB data on MYSQL running on a separate server
  • Enable file sharing to share data files using NFS, rsync
  • Clustering MYSQL across multiple servers using mysqlcluster
  • Cluster file system via DRDB, GFS2 or as Facebook does using Bittorrent

Presenter Notes

Setting up an HTTP accellerator using Apache

Presenter Notes

Apache as a reverse proxy

In this setup, the reverse server is what the user will contact while the real webserver can be hidden behind a private network.

1. On the reverse proxy server :

Enable required modules for caching reverse proxy.

a2enmod proxy

a2enmod proxy_connect

a2enmod proxy_http

a2enmod cache

2. Configure proxy module

vi /etc/apache2/modules-enabled/proxy.conf

1 <Proxy *>
2         AddDefaultCharset off
3         Order deny,allow
4         Deny from all
5         Allow from all
6 </Proxy>
7 ProxyVia On

Presenter Notes

Apache as a reverse proxy contd...

3. Setup (public) virtual host

Next we configure an empty virtual host that is configured to the public site. But instead of showing the document root we do a reverse proxy.

vi /etc/apache2/sites-available/public-domain.com

 1 <VirtualHost *:80>
 2 
 3     ServerName your-public-domain.com
 4 
 5     <Proxy *>
 6         Order deny,allow
 7         Allow from all
 8     </Proxy>
 9 
10     ProxyPass / http://your-private-domain.com/
11     ProxyPassReverse / http://your-private-domain.com/
12 
13 </VirtualHost>

a2ensite public-domain.com
service apache2 reload

Presenter Notes

Using Nginx

Presenter Notes

What is Nginx?

Nginx logo

  • Nginx was designed as a reverse proxy first, and an HTTP server second
  • Unlike Apache, Nginx uses a non blocking process model

Two modes of operation for Nginx:

  1. Use Nginx for the static content and Apache for PHP
  2. Use FastCGI to embed PHP

Presenter Notes

Nginx process model in a nutshell

  • Receive request, trigger events in a process
  • The process handles all the events and returns the output
  • Process handles events in parallel
  • Limitation is PHP can no longer be embedded (mod_php) inside process as PHP is not asynchronous
  • Unlike Apache, Nginx doesn't not have an .htaccess equivelant. You need to reload server after making any chage, making it difficult to use for shared hosting

Presenter Notes

Using Nginx and Apache side-by-side

In this setup we put Nginx as the frontend http accellerator and Apache as the backend app server. If you want to run this on the same physical server you'll need to either change the Apache port from 80 to another value or bind and Nginx to their own IP addresses with the same server.

Listen 8080

or using the ip address

Listen 127.0.0.1:8080

Now we're ready to install Nginx

sudo apt-get install nginx

Presenter Notes

Apache style virtual host in Nginx

Nginx uses a different format for defining virtual hosts than Apahche.

1 <VirtualHost>
2       DocumentRoot "/usr/local/www/mydomain.com"
3       ServerName mydomain.com
4       ServerAlias www.mydomain.com
5       CustomLog /var/log/httpd/mydomain_access.log common
6       ErrorLog /var/log/httpd/mydomain_error.log
7       ...
8 </VirtualHost>

becomes...

 1 server {
 2       root /usr/local/www/mydomain.com;
 3       server_name mydomain.com www.mydomain.com;
 4 
 5       # by default logs are stored in nginx's log folder
 6       # it can be changed to a full path such as /var/log/...
 7       access_log logs/mydomain_access.log;
 8       error_log logs/mydomain_error.log;
 9       ...
10 }

Presenter Notes

Redirecting all PHP requests to Apache

The following example will server all static content via nginx while redirect dynamic content (php) to Apache

 1 server {
 2     listen   80 default;
 3     server_name  localhost;
 4 
 5     access_log  /var/log/nginx/localhost.access.log;
 6 
 7     location / {
 8         root   /var/www;
 9         index  index.php index.html index.htm;
10     }
11 
12      ## Parse all .php file in the /var/www directory
13      location ~ .php$ {
14         # these two lines tell Apache the actual IP of the client being forwarded
15         proxy_set_header X-Real-IP  $remote_addr;
16         proxy_set_header X-Forwarded-For $remote_addr;
17 
18         # this next line adds the Host header so that apache knows which vHost to serve
19         proxy_set_header Host $host;
20 
21         # And now we pass back to apache
22         proxy_pass http://127.0.0.1:8080;
23     }
24 }

Presenter Notes

Using Nginx all the way

There is some debate as to whether using nginx with php via FastCGI is actually faster than redirecting to Apache. In anycase lets see how we can setup a pure nginx based model.

Install PHP-FPM

Unlike Apache, Nginx has has a hands off approach to managing php processes and therefore requires manual intervention. Fortunately as of PHP 5.3.3, there is a built in Front Process Manager (FPM), which looks after the php processes.

apt-get install php5-fpm

If your on Ubuntu 10.04LTS then you'll need to add a special repository before you can install php5-fpm.

add-apt-repository ppa:brianmercer/php && apt-get update

Next start the php5-fpm process

service php5-fpm restart

Presenter Notes

Install PHP-FPM contd...

Finally modify nginx configuration to use fast-cgi to redirect all files having the php extension.

vi /etc/nginx/sites-available/defaul

 1 server {
 2     listen   80 default;
 3     server_name  localhost;
 4 
 5     access_log  /var/log/nginx/localhost.access.log;
 6 
 7     location / {
 8         root   /var/www;
 9         index index.php index.html index.htm;
10     }
11 
12      ## Parse all .php file in the /var/www directory
13      location ~ .php$ {
14         fastcgi_pass   127.0.0.1:9000;
15         fastcgi_index  index.php;
16         fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
17         includefastcgi_params;
18     }
19 }

Presenter Notes

Thank you

Apache Feather

bud@thinkcube.com | twitter @geekaholic

Presenter Notes