Microsoft Research always has the best toys, and is especially strong in augmented reality (AR) with projects like Illumiroom and its insane Hololens Minecraft game. Redmond's think tank has just revealed another impressive demo called "SemanticPaint" that lets you scan objects in 3D using a Kinect. While that's not new, Microsoft's latest magic trick is to separate and define individual objects in the scene. That might one day allow us to create a more accurate visualization of the world, a boon for things like robots and self-driving cars.
To pull it off, the researchers scanned the environment with a Kinect to create a 3D scene, which the computer at first sees as just a single object. However, a human user can touch and interact with objects in real time (as shown in the video below), then vocally call out the name of each ("banana," for instance). The system then separates the touched object from the surfaces while creating a "class" for each. All of that is done using local resources (a laptop, for instance) for better interactivity.
Behind the scenes, however, software running on more powerful systems is learning as new objects are labeled, and can deduce if an object belongs to a specific class like "chair." In the last pass, it further refines the objects and creates a final 3D scene. Microsoft says that its online system can not only "rapidly segment 3D scenes," but also "learn from these labels" to perform the tasks better and faster in the future.
Once the system is perfected, users may one day be able to just walk around with a Kinect or other depth-sensing camera and create a detailed map of a scene, complete with individual objects. The possibilities for using such maps are endless, but Microsoft cited a few examples "from robot guidance, to aiding partially sighted people, to helping us find objects and navigate our worlds, or experience new types ofaugmented realities."