Deep learning systems can already detect objects in a given scene, including people, but they can't always make sense of what people are doing in that scene. Are they about to get friendly? MIT CSAIL's researchers might help. They've developed a machine learning algorithm that can predict when two people will high-five, hug, kiss or shake hands. The trick is to have multiple neural networks predict different visual representations of people in a scene and merge those guesses into a broader consensus. If the majority foresees a high-five based on arm motions, for example, that's the final call.