Before PrimeSense's technology, most gestural control systems were based on a "time-of-flight" method -- infrared light (or the equivalent: invisible frequencies of light) were sent out into a 3D space, and then the time and wavelengths of light that returned to the specially-tuned cameras would be able to figure out what the space looked like from that. But PrimeSense's method actually encodes information in light patterns as it goes out, and the deformation of those patterns is what the camera looks for.
Once the camera recieves the IR light back, it gets an image similar to the one above -- you can see me sitting with a notepad on the left, and a few other people from PrimeSense around the small room on the right. The computer builds a basic shape of the room it sees through the camera and the people in it, and then the real processing starts.
PrimeSense actually developed a chip that sits right in the camera device, and that's where the camera starts deciphering the image. It looks for any shapes that appear to be a human body (a head, torso, and two legs and arms), and then starts calculating things like how those arms and legs are moving, where they can move (your arms probably can't fold backwards at the elbow, for example), and where they'll be in a few microseconds.
A lot of this processing is done by Microsoft in its own software as well, and things like interfaces and the Kinect API weren't created by PrimeSense either -- those are both handled on Microsoft's end. "The vision of natural interface is something that was cooked up on Microsoft's side," I was told, "but they were waiting for the kind of technology that would enable it." PrimeSense's system does the basic calculations about what the computer sees as human and how it reports that to the Xbox itself.
PrimeSense reps also told me that the camera can "see" any number of people on the screen -- you can fit as many people in that camera as possible, and the computer will see all of them and can even recognize them as human shapes. But it can only run calculations on two people at a time, just because the processing power required to track all of the body's locations and movements is so great (Update: See below). During our testing with the device, a person moving in front of the camera was able to "steal focus," but the computer can also be told through gestures to keep focus on a certain person.
PrimeSense is also licensing the technology to non-gaming devices like media centers, coming to store shelves as soon as 2011 -- they are selling developer kits now, and showed me a demo of a possible interface for watching and browsing movies. Microsoft has done a lot already with this hardware, and PrimeSense is aiming to get it out into even more devices and homes in the future.
Update: PrimeSense has gotten in touch with us to say that it was only the demo software it showed off at E3 that could support only two users, not the hardware itself. Theoretically, with enough processing power, PrimeSense's hardware licensed by Kinect could support any number of players.
But the company can't speak to us about Microsoft's implementation or how many players it can support. We'll try to check with Microsoft and let you know what we find.