Monthly Archives: April 2011

Scaling Django to 30,000 requests per second

For the latest series of Britain’s Got Talent, Live Talkback are providing the infrastructure behind the audience buzzer. If you haven’t come across Britain’s Got Talent, it’s a talent show where each of the judges has a buzzer, with which they can “buzz off” an act.

For the audience buzzer, everyone at home can buzz their own buzzer, and instantly see how many other buzzes the act has got. Since acts can be on stage for just 30 seconds or so, everything has to keep up with the TV show.

Oh, did I mention there’s 10 million people or more watching, any of whom might decide to buzz along? Predicting the numbers of people who will choose to play along is notoriously hard, but we ended up with a target peak rate of 50,000 requests per second into the servers.

This is a scarily big number: 50K/s is 180 million requests/hr or or nearly 130 billion/month. For comparison, YouTube does 85 billion pages/month, and Twitter a mere 5.8 billion.

So how did a small startup with 4 guys in London build something that could scale to be bigger than Twitter? (OK, OK, I know page views != requests, and the peak != sustained, and twitter is more complicated than a buzzer. But still, the peak rate is *high*).

The underlying technology stack is pretty standard stuff – HAProxy, Django, mod_wsgi, Apache, MySQL, memcached, all running on Amazon EC2. But scaling this standard stack up to these sorts of levels showed up a few problems, and some useful tools.

The next few posts will detail some of the pitfalls, tools and techniques when scaling up to tens of thousands of requests per second.


Scaling to 30K: Tsung

The first problem when building something to scale is testing it. Apache Bench (‘ab’) and JMeter are well known tools that can simulate loads, but both run into issues when simulating large numbers of users.

Tsung is a load testing tool that we used, and it scales effortlessly. Running across a few EC2 nodes, Tsung was easily able to generate tens of thousands of requests per second – and provide nice graphs to show what was going on.

Some notes from my experience with it:

  • Tsung is largely CPU bound on EC2, and due to the way Erlang SMP works it seems it’s not worth running on multiple core machines. Luckily m1.small nodes are cheap and single core – so we simply used more of these
  • If you’re seeing error_connect_emfile or other Erlang errors, you’re probably hitting Linux resource user limits. Editing /etc/security/limits.conf to increase the nofile limit to 50000 or so will solve the problem
  • We used Chef to automate creation of Tsung nodes, which makes scaling up the load generation trivially easy
  • Keep a close eye on the Tsung nodes to make sure they haven’t run out of steam. Often what looks like your system hitting a scaling limit is actually the Tsung nodes hitting their limits. Tell-tale signs are CPU hitting 100%, and the user generation rate fluctuating

Overall Tsung made generating a load of 30K requests/s a relatively simple process (at least compared with making the system cope with that load!). I don’t know why it’s not more widely used – I can only guess that being written in Erlang makes it seem a bit off-putting. But on a modern Linux system it compiled and installs without issue, and the Erlang underpinnings means it scales horizontally entirely smoothly.

Scaling to 30K: Two-level caches

Like everyone else, we use memcached to speed access to frequently used data. Getting started with memcached in Django is as easy as:

from django.core.cache import cache

cache.set('key', 'value')

Add a line to to point to your cache server, and you’re off. As ever, the Django docs are superb.

Memcached is very fast and handles lots of connections with ease. However, as we drove the scaling past 10K requests/sec, we noticed that the cache server holding the active buzzer data was becoming network-bandwidth limited. Since network bandwidth is one of the harder things to scale on EC2, we needed a way to keep the Django web tier from hitting the cache quite so hard.

The solution: implement a two-level cache, just like the one in your CPU. By putting a second, read-only cache on each of the Django boxes, a huge number of reads are serviced locally without hitting the network. Result: dramatically lower loads on the main cache servers, and the load on them only growing with write activity.

The trade-off is that the data sat in the caches on the web boxes can be out of date – and invalidating a cache key across all the caches is non-trivial. But even a very short cache expiry time (we use 1 second) saves a huge amount of reads, while keeping the stale data problem under control.

The code for the two-level cache is available on GitHub.

Using the TwoLayerCache

Using the two layer cache is simple:

  1. Download the code
  2. Put the file in your application
  3. Run memcached on each of your Django boxes
  4. Add the following to your
    LOCAL_CACHE_ADDR = ('',)
  5. Change your code from:
    from django.core.cache import cache


    from dualcache import cache
  6. That’s it.

Other than changing the code to point to the dual-level cache rather than Django’s standard cache, everything should keep working just as-is.

How it works

The basic idea is that any read operation is first tried against the local cache, and if it’s found the value is returned – without contacting the global cache, which is whatever you’ve set up as Django’s cache. Otherwise the global cache is queried and the value(s) or None returned as normal.
Writes and anything which might alter data are intercepted and the local copy deleted, so that future reads will re-fetch from the global cache.