Second Life A/B orientation testing, and how to mess it up

A/B testing has been used by Linden Lab for some time to gauge the effectiveness of the new user orientation experience in Second Life. A/B testing (also known as Split Testing) is a fairly straightforward system where a set of users are given the A experience (basically what already exists) as a control group and a random sampling are shunted to an alternative (the B experience) that differs in only one respect. The results of the two groups are then compared to look for statistical improvement. Through a series of A/B tests, the particular elements that tend to produce the greatest improvements can be determined.

It's a great way of improving new user experience over time. Unfortunately, the efforts of Linden Lab's VTeam (in charge of the volunteer programs) create results that are not only misleading, but actually statistically worthless.


Are you a part of the most widely-known collaborative virtual environment or keeping a close eye on it? Massively's Second Life coverage keeps you in the loop.

While the staff directly involved with the new user experience are setting up A/B tests, creating a B experience comprising the element that they feel might provide an improvement over the standard A experience, the Vteam messes up the resulting test by encouraging the reallocation of volunteer resources, urging as many as possible to cram into the B experience and to be as helpful as possible.

This drains away resources from the A experience, and artificially inflates the volunteer resources available at the B experience, causing an abnormal spike of volunteer activity that will not actually be reflected in the normal course of things.

It's like watching people to see which of two rooms they prefer, so you can determine which wallpaper is the more popular — only someone crams bikini-models into one room. Whatever results you get, it won't tell you much about wallpaper.

At the end of the test, volunteers return to their normal behavior patterns and what looked like a slight win for the B experience likely results in worse retention and a poorer experience overall than was measured during the test.

"I cannot teach him. The boy has no patience."

While A/B testing is simple enough to accomplish, the fundamental problem is that Linden Lab seems to lack the resources (or the inclination) to perform the continuous minor trials that give useful statistical results. The urge is always there to perform large scale changes, and to then fling volunteer resources at these completely different multivariate experiences in an effort to simulate a 'normal' orientation experience. Which it doesn't. It can't.

While A/B testing has been going on for a very long time, and despite a succession of B experiences winning out, Linden Lab says that the retention rates have simply not risen, and indeed have suggested that they may have gone the other way. That isn't surprising.

Self-discipline, patience and persistence are the hallmarks of useful A/B testing. Without them, the results are governed by chance. You might get lucky. Do you feel lucky?

Recommended