Humans are usually good at isolating a single voice in a crowd, but computers? Not so much -- just ask anyone trying to talk to a smart speaker at a house party. Google may have a surprisingly straightforward solution, however. Its researchers have developed a deep learning system that can pick out specific voices by looking at people's faces when they're speaking. The team trained its neural network model to recognize individual people speaking by themselves, and then created virtual "parties" (complete with background noise) to teach the AI how to isolate multiple voices into distinct audio tracks.