Ever since then, CCP has waged a largely unseen war against the impossibility of keeping all of EVE's players in one single-shard universe. Holding on to that core ideal that's made EVE the successful sandbox game it is today, developers have pursued every avenue in the fight against lag. While funding research into Python's Stackless IO and constantly optimising code, CCP built the biggest supercomputer in the games industry to house New Eden's growing population. With over 400,000 players now inhabiting the same world and a typically weekly peak concurrency of over 50,000 characters, CCP has been forced to develop some big guns in the war on lag.
In this week's EVE Evolved, I look at some of the biggest developments CCP has made in the war on lag, including the new Time Dilation feature that literally slows down time to let the server catch its breath.
The thin blue lag
For most of EVE's life, server lag could only really be monitored when it occurred on the live server. As the logging systems themselves caused additional server load, nodes had to be manually log-enabled to capture a useful amount of information. Certain conditions only occur on the live server when you get hundreds or thousands of players together in one place battling over a system. Attempts to replicate those conditions on the test server using mass-testing events often failed to replicate a bug that plagued the live server. This time last year, CCP Atropos published a devblog on a tool designed to solve this intractable live-testing issue.
Developed in-house for use in debugging EVE's high-lag scenarios, the thin client is essentially the same as a normal game client but with the graphics stripped out and a command line interface built in its place. Hundreds of these clients can be run on cheap hardware, and actions can be coordinated automatically by issuing a command to every client at once. If there's a fatal but rare bug that only only happens when hundreds of cruise missiles are smartbombed or a thousand turrets shoot a starbase at the same time, that can now be tested and fixed on the test server before going live. Since its development, the thin client has been instrumental in tracking down and issues with missile lag and testing optimisations designed to reduce it.
If I asked most EVE players who or what RAD Game Tools was, I'd probably get a lot of blank stares and shrugged shoulders in response. RAD Game Tools is a small software development firm in Washington that most people have never heard of, and yet almost every game published today uses something the firm created. Earlier this year, CCP licensed RAD's Telemetry server profiler for use on EVE's main Tranquility game server. The program provides millisecond-accurate logs of exactly how much time is spent performing each server function, and it allows developers to visualise that data in an incredibly useful manner.
Before Telemetry, the reasons for lag and node deaths in an overloaded fleet battle were difficult to pin down. Developers had to manually pick through incomplete logs and player reports to form a sketchy picture of what was happening when lag set in. Telemetry gives developers a vital window into events as they unfolded, highlighting which procedures the server spends most of its time on during high-lag situations. This informs optimisation by showing where the biggest savings can be made for fleet battles. Combined with the thin client, this can be used to accurately simulate the server conditions in a fleet battle and field-test optimisations before they go live.
Slowing down time
CCP Veritas has been crawling around in the depths of EVE's back-end server code since April 2010, and he really doesn't get enough credit for the incredibly complex work he does. Around two months ago, he published a devblog on a potentially revolutionary and extremely elegant solution for EVE's growing fleet lag conundrum. The ominously named "time dilation" is a system in which high lag causes time within the game to literally slow down. In fleet battles, lag sets in because the server has a maximum number of commands it can reasonably process per second, and once the incoming commands exceed this value, they begin to queue up in an ever-increasing list.
Each command takes progressively longer to reach the end of the queue, and the game slows to a halt as a result. Due to procedures in which some non-critical commands can yield to other more essential ones, some players would feel the lag more than others. With time dilation, the game's tick rate literally slows down as the node approaches its maximum processing limit, causing repeated commands like guns firing, ships moving, and physics collisions to be issued less frequently. The dilation only affects the current area of the game that's under heavy load, and as players begin killing each other or leave the field of play, the server will invariably recover and time will return to normal. Veritas published an early test of the system this week, with promising results.
The thin client and telemetry profiler have both proven to be excellent tools in the fight against lag, allowing CCP to keep EVE Online as the single-shard sandbox universe it has always been. The absolutely inspired time dilation concept works well in theory, and early tests are very promising. It's easy to forget about all the work that goes on in the background just to keep an expanding game like EVE running smoothly, but if it weren't for people like the developers on Team Gridlock, the game would quite literally grind to a halt.
Brendan "Nyphur" Drain is an early veteran of EVE Online and writer of the weekly EVE Evolved column here at Massively. The column covers anything and everything relating to EVE Online, from in-depth guides to speculative opinion pieces. If you have an idea for a column or guide, or you just want to message him, send an email to email@example.com.