Advertisement

EVE Evolved: The war on lag

It seems as though everywhere I go to read about EVE Online, someone is complaining about lag. Throughout the game's seven-year history, developers and server engineers have waged a constant battle against the lag monster. Frequent upgrades and code overhauls have ensured that the capacity of each server cluster increased at pace with the growing subscriber numbers. When the Dominion expansion came, something in it caused lag to get a lot worse. The issue has yet to be corrected and has even spurred some players to put media pressure on CCP to correct the issue.

Until recently, the developers at CCP had been very quiet on the topic of lag and their efforts to combat it. Aside from the occasional fleet-fight mass testing event on the test server and the news that there was actually an entire team dedicated to lag, players were left largely in the dark as to what was being done to address the issue. In the absence of strong evidence to the contrary, many players began to assert that EVE's developers weren't working on lag at all. Earlier this week, we posted that CCP was planning a series of devblogs on lag to showcase the progress it's made. In a surprisingly rapid turn-around, four devblogs on lag and another on CCP's core technology groups have already been posted. They cover such topics as server scalability, the results of recent mass testing events, and CCP's new "thin client" testing tool.

In this week's EVE Evolved, I introduce each of CCP's four recent devblogs on lag with a quick summary.


CCP Tanis -- Mass-testing events


CCP Tanis wrote the first blog entry on mass testing events. The mass testing programme is one of the ways CCP has been trying to involve players directly in the development and bug-solving process. Before a new feature or a change to the code goes live, it's vital to test it under realistic gameplay conditions. This identifies potential bugs or performance issues in upcoming patches before they're applied to the main game server, which helps stop features inadvertently introducing further lag and game-breaking bugs to the game.

Tanis explains that the mass testing events are used to gather performance trend data, to rapidly test high-priority changes such as critical bug-fixes, and to get important feedback on new features. Players attending the events not only have a hand in squashing bugs in EVE, but also have the opportunity to have their say on upcoming changes before the patch goes live. CCP tries to run a live testing event once every two weeks, and Tanis publishes the results of each session on the forum shortly after its conclusion.

Details recorded include the circumstances of the test, how close participating players thought the test was to real scenarios on the main game server, and what bugs or issues were under investigation. The main issues investigated so far have been jump-in lag, fleet lag and overview glitches. In the devblog, CCP Tanis posted a few graphs of server load during a recent mass testing event and explained what was going on in each instance. The insights CCP has gained from these tests have helped pinpoint issues with modules recycling during laggy periods. Tanis urges players to make an effort to turn up at mass testing events and help CCP in the war against lag.

CCP Warlock -- The long lag

CCP Warlock began his blog entry by describing his PhD work in getting robots to cooperate on a task without centralised control. He likened this to large fleet fights, in which players communicate and attempt to coordinate their actions to meet a common goal. As the number of people involved increases, the relationship between their organisational structure and the amount of total information they can communicate doesn't scale linearly.

The same scalability issue exists in EVE's back-end server mechanics, and it may be a big factor in fleet battle lag. CCP Warlock believes that some of EVE's current mechanics have been built as simple hierarchical systems for ease of initial implementation, but that systems built in this manner don't tend to scale well beyond a certain point.

The majority of the devblog focuses on the inherently complex issues of scaling systems on a distributed network like EVE's server clusters. If it shows anything, it's that tracking down the main cause of lag in EVE is a lot more complicated than many players believe. CCP clearly has some smart people working on the issue, but the war on lag definitely looks set to be a long one.

CCP Atropos -- The thin client

Among the many projects aimed at fixing lag, the thin client has perhaps the greatest potential. The thin client is just like a normal EVE client, except with all of the sound and graphics stripped out. The game is controlled from a command line interface, which takes up far less system resources than a standard client. As a result, the EVE developers can boot up dozens or hundreds of clients at a moment's notice and have them all log into the game.

Many of the problems relating to fleet lag are those of scalability, making manual testing tricky. At the moment, CCP relies on the hundreds of players involved in mass testing events to reproduce fleet lag in a semi-controlled environment. With the new thin client, however, that could be about to change. Commands can be issued to a large series of clients at once, allowing automated testing of issues that only occur on a high-load server node with hundreds of players fighting.

The big advantage of this is that a change in the server code can be rapidly tested under the exact same controlled conditions without having to wait for one of the mass testing events. The faster turnaround on changes and precisely controlled circumstances of the test should make it possible to test lag-cutting optimisations on realistic scales. Perhaps more useful will be the ability to consistently reproduce bugs that only occur under certain rare conditions when the server is under heavy load, making bug-fixes in this area a lot more feasible. The thin client definitely looks like it will be a valuable weapon in the war on lag, but there's a lot of work still to be done on it.

CCP Atlas -- Character nodes

For most of us, the internal workings of the EVE server are a complete mystery. In our daily play sessions, we don't really need to know anything about SOL processes and market nodes. As players, we might not realise just how complex the server architecture is. In his devblog, CCP Atlas gave a run-down of EVE's server architecture and how its current load balancing mechanisms work.

The EVE server process is split into small functional units, such as the market in a given region or flight and combat in a given area of space. These units are distributed between the server's many CPU cores. If the load becomes too high on one CPU, units can be transferred off to other CPUs by adding more nodes. The issue CCP is having is that when a single unit is on a CPU by itself and that core still exceeds its maximum load, there's not much they can do about it. This effect can be seen in large fleet battles, as the combat occurring in an area of space is a single functional unit. It's not possible to split a single functional unit across multiple CPUs, which is why the game begins lagging horribly.

CCP has been working on various ways of splitting areas of space and combat into smaller functional units, which would help with load balancing. One such optimisation that's already being rolled out onto the main EVE server is the introduction of "Character nodes." These are new types of server nodes that house all operations and data which are specific to a given character but not a given location. When a request comes into the server to look up data belonging to a character, it can be looked up on a character node, meaning your local location node has less work to do. This is a big step forward, and has already been used to increase the carrying capacity of Jita, EVE's biggest trade hub.

Summary

Everywhere I go, I see people demanding that CCP "just fix the lag." Through their recent devblogs, EVE Online's developers have been showing us just how complex the issue of lag is and what work they've been doing to help combat it. All of these devblogs are worth a full read, as there's a lot more to them than I've been able to fit in this week's column. More than anything, though, this recent flurry of devblogs makes me hopeful that we're seeing the start of a more active and communicative development process within CCP.


Brendan "Nyphur" Drain is an early veteran of EVE Online and writer of the weekly EVE Evolved column here at massively.com. The column covers anything and everything relating to EVE Online, from in-depth guides to speculative opinion pieces. If you have an idea for a column post or guide or just want to message him, send an e-mail to brendan@massively.com