The first problem when building something to scale is testing it. Apache Bench (‘ab’) and JMeter are well known tools that can simulate loads, but both run into issues when simulating large numbers of users.
Tsung is a load testing tool that we used, and it scales effortlessly. Running across a few EC2 nodes, Tsung was easily able to generate tens of thousands of requests per second – and provide nice graphs to show what was going on.
Some notes from my experience with it:
- Tsung is largely CPU bound on EC2, and due to the way Erlang SMP works it seems it’s not worth running on multiple core machines. Luckily m1.small nodes are cheap and single core – so we simply used more of these
- If you’re seeing error_connect_emfile or other Erlang errors, you’re probably hitting Linux resource user limits. Editing /etc/security/limits.conf to increase the nofile limit to 50000 or so will solve the problem
- We used Chef to automate creation of Tsung nodes, which makes scaling up the load generation trivially easy
- Keep a close eye on the Tsung nodes to make sure they haven’t run out of steam. Often what looks like your system hitting a scaling limit is actually the Tsung nodes hitting their limits. Tell-tale signs are CPU hitting 100%, and the user generation rate fluctuating
Overall Tsung made generating a load of 30K requests/s a relatively simple process (at least compared with making the system cope with that load!). I don’t know why it’s not more widely used – I can only guess that being written in Erlang makes it seem a bit off-putting. But on a modern Linux system it compiled and installs without issue, and the Erlang underpinnings means it scales horizontally entirely smoothly.