Facebook and NYU trained an AI to estimate COVID outcomes

The machine learning model is as accurate as a human radiologist.

Xinhua News Agency via Getty Images

COVID-19 has infected more than 23 million Americans and killed 386,000 of them to date, since the global pandemic began last March. Complicating the public health response is the fact that we still know so little about how the virus operates -- such as why some patients remain asymptomatic while it ravages others. Effectively allocating resources like ICU beds and ventilators becomes a Sisyphean task when doctors can only guess as to who might recover and who might be intubated within the next 96 hours. However a trio of new machine learning algorithms developed by Facebook’s AI division (FAIR) in cooperation with NYU Langone Health can help predict patient outcomes up to four days in advance using just a patient’s chest x-rays.

The models can, respectively, predict patient deterioration based on either a single X-ray or a sequence as well as determine how much supplemental oxygen the patient will likely need. “These predictions could help doctors avoid sending at-risk patients home too soon, and help hospitals better predict demand for supplemental oxygen and other limited resources,” FAIR researchers wrote in a Friday blog post.

“COVID is a unique virus,” Dr. William Moore from NYU Langone Health told Engadget. Most viruses attack the respiratory bronchioles which results in a pneumonia-like area of increased density, he explained. “But what you won’t usually see is a tremendous amount of hazy density.” However that’s exactly what doctors are finding with COVID patients. “They'll have increased density that appears to be a pneumonitis inflammatory process rather than a typical bacterial pneumonia, which is a more dense area and in one specific spot. [COVID] seems to be bilateral; it seems to be somewhat symmetric.”

When the outbreak first reached New York City, “we started trying to figure out what to do, how we could actually help manage the patients,” Moore continued. “So there were a couple things that were going on: there's a tremendous number of patients coming in, and we had to figure out ways to predict what was going to happen [to them].”

To do so, the NYU-FAIR team began with chest x-rays. As Moore notes, x-rays are performed regularly, basically whenever patients come in complaining of shortness of breath or other symptoms of respiratory distress and are ubiquitous at rural community hospitals and major metropolitan medical centers alike. The team then developed a series of metrics by which to measure complications as well as the patient’s progression from ICU admittance to being put on ventilation, intubation, and potential mortality.

“That's another clear demonstrable metric that we could use,” Moore explained regarding patient deaths. “Then we said ‘okay, let's see what we can use to predict that,’ and of course the chest X-ray was one of the things that we thought would be super important.”

Once the team had established the necessary metrics, they set about training the AI/ML model. However, doing so raised another challenge. “Because the disease is new and the progression of it is nonlinear,” Facebook AI program manager Nafissa Yakubova, who had previously helped NYU develop faster MRIs, told Engadget. “It makes it difficult to make predictions, especially long-term predictions.”

What’s more, at the outset of the epidemic, “we did not have COVID data sets, there were especially no datasets labeled [for use in training an ML model],” she continued. “And the size of the datasets were quite small as well.”

CHICAGO, ILLINOIS - DECEMBER 15: (EDITORIAL USE ONLY)  Melissa Rodriguez takes a chest x-ray of a COVID-19 patient at Roseland Community Hospital on December 15, 2020 in Chicago, Illinois. Roseland Community Hospital is situated in the Roseland neighborhood on the city's far south side. The neighborhood's population is 95 percent black. The COVID-19 death rate among black residents in Chicago is nearly double that of the city’s white residents. This week the United States recorded it's 300,000 COVID-19 death.  (Photo by Scott Olson/Getty Images)
Scott Olson via Getty Images

So the team did the next best thing, they “pretrained” their model using larger publicly available chest x-ray databases, specifically MIMIC-CXR-JPG and CheXpert, using a self-supervised learning technique called Momentum Contrast (MoCo).

Basically, as Towards Data Science’s Dipam Vasani explains, when you train an AI to recognize specific things -- say, dogs -- the model has to build up to that ability through a series of stages: first recognizing lines, then basic geometric shapes, and then more detailed patterns, before being able to tell a Husky from a Border Collie. What the FAIR-NYU team did was take the first few stages of their model and pre-train them on the public larger data sets, then went back and fine-tuned the model using the smaller, COVID-specific dataset. “We're not making the diagnosis of COVID -- if you have a COVID or not -- based on an x-ray,” Yakubova said. “We are trying to predict the progression of how severe it might be.”

“The key here I think was... using a series of images,” she continued. When a patient is admitted, the hospital will take an x-ray and then likely take additional ones in the coming days, “so you have this time series of images, which was key to having more accurate predictions.” Once fully trained, the FAIR-NYU model managed around 75 percent diagnostic accuracy -- on par with, and in some cases exceeding, the performance of human radiologists.

Cremona, radiology department of the Maggiore Hospital of Cremona; radiologists observe CT scans of covid-19 patients' lungs. (Photo by: Nicola Marfisi/AGF/Universal Images Group via Getty Images)
AGF via Getty Images

This is a clever solution for a number of reasons. First, the initial pretraining is extremely resource-intensive -- to the point that it’s simply not feasible for individual hospitals and health centers to do so on their own. But using this method, massive organizations like Facebook can and will develop the initial model and then provide it to hospitals as open-source code, which those health providers can then finish training using their own datasets and a single GPU.

Second, since the initial models are trained on generalized chest x-rays rather than COVID-specific data, these models could -- in theory at least, FAIR hasn’t actually tried it yet -- be adapted to other respiratory diseases by simply swapping out the data used for fine-tuning. This would empower health care providers to not only model for a given disease but also tune that model to their specific locality and circumstances.

“I think that's one of the really amazing things that the team did from Facebook,” Moore concluded “is take something that is a tremendous resource -- CheXpert and MIMIC databases -- and be able to apply it to a new and emerging disease process that we knew very little about when we started doing this, in March and April.”