Raspberry Pi ordered!

Have just placed my order for a Raspberry Pi. Haven’t been this excited about getting a new computer/gadget since Lego Mindstorms.

My current favourite hack for it is to port a BBC B emulator to it, then wire up in an old Beeb box and use it to play HD Elite on the big TV. So 1980’s.

Speaking at Cassandra Europe

I’ll be speaking at Cassandra Europe on March 28th 2012. I’m in the “Case Studies” track, sharing our experiences using Cassandra to power high-load applications like X Factor and Britain’s Got Talent.

It looks like this is going to be a great conference for anyone using Cassandra in Europe, and I look forward to hearing more about what people are doing, and meeting some others using Cassandra in weird and wonderful ways.

A post a day

This post is about regular posting – and changing behaviour. I believe under international law all bloggers are allowed one post about how difficult posting regularly is – this is my contribution to the oeuvre!

A while back I resolved to post more regularly, but as the evidence here shows that resolution was a failure. Then recently I finished reading Switch – How to Change Things When Change is Hard. Changing into a regular poster certainly seemed hard, and I knew I’d failed before. Could Switch help me?

A few key ideas seemed like they could help:

  1. Shrink the Challenge. Instead of trying to post regularly forever, make it for a week
  2. Black and White goal. Instead of posting regularly, change to post every work day. No wiggle room there!
  3. Celebrate progress. Note I’m posting this on Wednesday, having already made 2 posts this week. Yay, I’m 40% of the way there!
  4. Instant habits. Set an “action trigger” for when I’m going to write a post each day. For me, it’s after each day’s scrum meeting.

I don’t yet know if I will succeed in changing into a regular blogger, but so far the techniques seem to be working. More on Switch in a future post…

Memcached can count – pylibmc can’t

In my recent talk to Big Data London I said that memcached can’t count. Turns out that’s not really true, and the problem lies somewhere else.

Here’s the original evidence memcached has counting issues, using Python & pylibmc:

In [2]: cache.set('wibble', 0)

In [3]: cache.get('wibble')

Out[3]: 0

In [4]: cache.decr('wibble', 1)
Out[4]: 0L

In [5]: cache.incr('wibble', 1)
Out[5]: 1L

In [6]: cache.incr('wibble', -1)
Out[6]: 4294967296L

Incrementing by -1 has resulted in memcached incrementing by 2^32-1 – a classic signed/unsigned conversion problem.

But hang on – according to the memcached spec, incr takes unsigned 64bit integers – and it’s an error to send a signed number.

Sure enough, using the ASCII protocol (with memcached 1.4.9):

get wibble
VALUE wibble 0 1
incr wibble -1
CLIENT_ERROR invalid numeric delta argument

It seems memcached can count after all, or at least refuse to do operations it might get wrong. It’s another question why memcached won’t deal with signed numbers…

So where’s the weird behaviour above coming from? It’s a couple of bugs in pylibmc – bug 74 and bug 73.

The good news? Both bugs are now fixed in the latest github version, so very soon memcached and pylibmc will be able to count. Three cheers – hooray, hooray!


UX Sketching – seeing inside the minds of your users

Yesterday I went to a User Experience Sketching session hosted by Devin Hunt from Lyst. I had no idea what UX Sketching was about, but it sounded interesting and some people I follow on Lanyrd were going, so what the heck.

If you haven’t heard of UX Sketching either, it turns out to be a way to see inside the minds of your users and experience your site/product through their eyes. Even better, it’s really simple:

  1. Find a user
  2. Ask them to quickly sketch the page/feature/item of interest
  3. Look at what they draw – and what they don’t draw
  4. Repeat until you’ve got a bunch of data points

By getting the user to draw from memory, you find out what’s most important/salient to them. One example from Devin – on Lyst they had a “Love” feature. Click metrics said people were clicking on it – but it never showed up in sketches. Turns out when they dropped the feature, no-one noticed – it was only being clicked because it was there, not because it was useful.

So when is this tool useful? Obviously, you have to have some users who have some familiarity with the site/feature. It’s no use if they haven’t seen it – so it can’t start a design process for a new feature.

What it can do is tell you where your design is working, or where it’s not. Perhaps you want reviews to be central to a buying page, but users aren’t sketching them. That could trigger a round of design work, mockups, A/B testing etc to see what raises the awareness. Maybe you’re trying to figure out where on a page things should go – looking at where users sketch them will tell you where they intuitively think they should go.

I think seeing through others eyes is one of the hardest things to do in design, so this seems like a great quick, easy way to do it.

Now off to get some users sketching…

Talk at London Big Data meetup

I spoke at the London Big Data meetup a couple of weeks ago about counting, and how difficult it is if you want to count things very very fast. My slides from this are now up here and on Slideshare.

More on the joy of counting soon, including why Memcached can’t count.

Behind the scenes: Using Cassandra & Acunu to power Britain’s Got Talent

In some previous posts, I’ve talked about how we scaled Django up to cope with the loads for Britain’s Got Talent. One area I haven’t talked about yet is the database.

For BGT, we were planning for peak voting loads of 10,000 votes/second. Our main database runs on MySQL using the Amazon Relational Database Service. Early testing showed there was no way we could hit that level using RDS – we were maxing out at around 300 votes/s on an m1.large database instance. Even though there are larger instances, they’re not 30x bigger, so we knew we needed to do something different.

We knew that various NoSQL databases would be able to handle the write load, but the team had no experience in operating NoSQL clusters at scale. We had less than 2 weeks before first broadcast, and all the options available were both uncertain and high risk.

Then a mutual friend introduced us to Acunu. They not only know all about NoSQL, but have a production-grade Cassandra stack using their unique storage engine that works on EC2. Tom and the team at Acunu quickly did some benchmarking on EC2 to show that the write volume we were expecting would be easily handleable, as well as testing out the Python bindings for Cassandra. That gave us good confidence that this could easily scale to the loads we were expecting, with plenty of headroom if things went mental.

We wired Cassandra into our stack, and started load testing against a 2-node Cassandra cluster. While we’d originally expected to need more nodes, we found that the cluster was easily able to absorb the load we were testing with, thanks to the optimisations in the Acunu stack.

So how did it all go? Things were tense as the first show was broadcast and we saw the load starting to ramp up, but the Acunu cluster worked flawlessly. As we came towards the start of the live shows, we were totally comfortable that it was all working well.

Then AWS told us that the server hosting one of the Cassandra instances was degraded and might die at any point. Just before the first live finals. We weren’t too worried as adding a new node to a cluster is a simple operation. We duly fired up a new EC2 instance and added it to the cluster.

Then things went wrong. For some reason, the new node didn’t integrate properly into the cluster and now we had a degraded cluster that couldn’t be brought back online. And only a few hours until showtime. I love live TV!

The team at Acunu were fantastic in supporting us (including from a campsite in France!) both to set up a new cluster and to diagnose the problem with the degraded cluster. For the show, we switched over to the new cluster as we still hadn’t been able to figure out what was wrong with the old one (it turned out to be a rare bug in Cassandra).

Thankfully the shows went off without a hitch and no-one saw the interesting juggling act going on to keep the service running.

So a big thank you to the team at Acunu for their help “behind the scenes” at BGT – we couldn’t have done it without them.