Monthly Archives: May 2011

Slides from Big Data meetup

Slides from my talk at the first Big Data London Meetup are here: http://www.slideshare.net/malcolmbox/scaling-the-britains-got-talent-buzzer

They were written for speaking to rather than reading, so apologies if they don’t make sense in isolation. I don’t think there’s any video of the event available online.

Speaking at the London Big Data meetup

I’m looking forward to the first London Big Data Meetup on Wednesday 25th. I’ll be speaking about our experiences building and operating the Britain’s Got Talent buzzer for ITV, and hoping to hear about lots of other cool things people are doing with Big Data.

Shipping sucks

We pushed Tellybug into the app store last week, and over the weekend it went live. Today the Guardian covered the app, and we started to pick up users. Great! Isn’t this the dream of every startup, the goal of every engineer: product shipped?
Of course it is. But still, shipping sucks.

Here’s just some of the ways shipping sucks:

  1. It’s just not ready! Whenever you ship, you know all the ways things could be improved, all the rough edges, all the places where it doesn’t live up to your original vision.
  2. Users! Now you have users, each with their own views on your masterpiece. And since this is v1, most of them won’t be complimentary.
  3. No users! Even worse than users is having no-one admiring your new baby. Why aren’t people flocking to use this revolutionary new product?
  4. Taxes! If you’ve been avoiding paying your programming taxes, this is where it all comes back to bite you. No monitoring? No crash logging/reporting? Oh dear.
  5. Marketing! No longer enough to sweat over code, now you need to promote your new product left right and centre. And sweat over the code.

See? Shipping sucks. But there’s something that sucks worse.¬† Not shipping.

So I’m glad we shipped Tellybug, even though now there’s a million things to do to keep it running and fix all the things that we didn’t get done for v1.

Shipping a social app

This week we shipped Tellybug. This isn’t the first product I’ve shipped, but somehow this one feels different.

Before this, I’ve shipped software that’s used in hundreds of millions of phones, shipped complete phone projects, shipped marketing programs and shipped iPhone apps and web sites. None of these have felt like shipping Tellybug, our new social TV app.

So what makes shipping Tellybug feel so different?

There’s plenty of candidates: shorter delays between finishing the code and having it in user’s hands; being part of a much smaller team; the joy of seeing something new come to life.

But I think the real reason is that Tellybug is social, and that means I can see what our users are doing, and even their faces, in the app itself.

So each time I load up Tellybug now, I get a “oh wow” moment as I virtually meet all these lovely users. I’ve never had that in a previous product, and it feels different.

Scaling to 30K: HAProxy on EC2

(This is the third part of the Scaling Django to 30K requests/s series)

We use HAProxy on EC2 instances to load-balance the incoming HTTP requests across the web server boxes. Amazon provide the Elastic Load Balancer(ELB) service which does a similar thing, so why did we run our own?

The biggest difficulty for us with ELB is that our traffic peaks very quickly when the TV show is on. For example, on Saturday between 3 minutes before Britain’s Got Talent and 2 minutes into the show, the load on our servers tripled, and it peaked 10x higher. ELB provides nearly infinite capacity, but takes tens of minutes to scale up – too slow for our needs.

Using our own HAProxy nodes lets us pre-scale to cope with the expected peak demand, while dynamically scaling the web layer below.

One thing that there doesn’t seem to be much information on is the peak load that can be handled on an EC2 node. Our testing showed that a c1.medium could handle approximately 5,000 incoming connections per second. m1.small handled somewhat less, but larger node sizes didn’t provide an increase. It seems there’s some EC2 network/hypervisor/something else limit that means > 5K/s/node isn’t achievable.