Minority Report UI designer John Underkoffler talks about the future of gestures

Unless you're an engineer who designs cutting-edge user interfaces, you've probably never heard the name John Underkoffler before ... but you've definitely seen his work. Remember the computer Tom Cruise uses by waving his hands around in Minority Report? He designed that. And it wasn't just faked together for the movie, Underkoffler had a working prototype at MIT before that, and now he's designed a version that you can actually purchase.

Underkoffler now serves as the Chief Scientist at Oblong Industries, and we spoke with him recently in connection with the now-on-Blu-ray Minority Report, and he talked extensively about the future of computing, how video games are driving technology forward, what in the hell the game Tempest is all about, and why it's time for the mouse to die. Read on after the break for the full interview.

Obviously, this is timely because Minority Report has just hit hit Blu-ray. Though the film has been available for awhile, we're now living in this day and age where gesture-based computing seems like it's totally going to happen any moment. What's your background? I know you're an engineer, but how did you go from there to Minority Report?

The origin story?

Right. Exactly.

"The mouse has run its course and enough is enough."

The first place I got to was school, and I ended up there for a long time. It was a confluence of a bunch of great beginnings, I suppose. So I got to MIT in 1984 just before the MIT Media Lab was opening, and it couldn't have been a better fit. So I was involved with that right from the very beginning, even as I pursued all of the silly degrees and so forth. But, you know, there was that special excitement that is in the beginning of a thing. And we really felt like we were onto a bunch of new stuff. And my work there turned more and more towards new kinds of interfaces, novels and UIs, partly because it felt to me like, you know, at the time it was only 10 years since the Macintosh sort of made the mouse and the window space GUI pervasive. It seemed like 10 years was great, and shouldn't we be inventing a new one? And curiously enough, the Web happened along right about at that moment and I think everyone got distracted and people forgot to invent a new interface when, really, commercially, that should have happened. But that's a matter for history to sort out.

I ended up building a system called the Luminous Room that was all about bringing together input and output and letting people attach digital meanings to physical objects and kind of bringing the digital world and the physical word into very intimate contact in a way that traditional interfaces really didn't get at. And I was just finishing that work up in about 1999 when an advance phalanx from the Minority Report pre-production team visited my lab. And most particularly, of course, Alex McDowell, the film's amazing production designer, showed up and he was poking around and look at all the stuff that was going on at the lab at the time. But in addition to being on kind of a general shopping spree, if you will, for emerging technology that could be plausibly imported into or extrapolated toward the year 2054, he knew already that there was a kind of data manipulation problem that the film needed to address in a very fore grounded way, mainly those themes of kind of forensic analysis in the pre-cog dreams. And I think he and I hit it off, and I've been a film nut forever. It was very easy to talk about how the ideas that we were building for real at the lab could be adapted for use and maybe even more cinematic than perhaps they already were. So that brought me out to LA to work on Minority Report.

I read a quote where you said that, "The future that was actually presented in the film was a lot grayer than the one came out of that initial convergence meeting where everyone met with Spielberg to just map out the year 2054."

Yeah, that's true. And it's odd for me to say, but I'm proud of that aspect of the film. It's pleasing to me that the film does play with ideas of moral ambiguity in that way. You know, the best science fiction always does that. It addresses not only the fun of gadgets but the social, and political, and personal, and psychological consequences of technology. And ideally, if you are in a Philip K. Dick world, all that stuff is colliding with enormously unintended consequences. So it was just great to see that play out in the film.

What's funny is that this is a Steven Spielberg film that stars Tom Cruise, but the iconic moment that has stayed with everyone, that everyone always talks about when they refer to that film, is the sequence where he's using the computer to locate the future suspect and to find him. Were you present for all of that, and how much of all of that was your input?

I was absolutely present for all of that stuff, and it is not any coincidence at all that it looks like the current G-Speak. It is gratifying, certainly, that people still remember those sequences 10 years on and that, as you say, it is one of the film's iconic images. I think that part of the reason that all that stuff worked so well is that Steven, indirectly, and Alex McDowell, directly, because that's how he worked, kind of in an unbelievable depth and thoroughness, let us come at that design problem, really, as if it were a real world design problem.

That's how I came at it. I sort of sat there and said, "Well, I have, in the past, built stuff very much like this that actually functions. How would we design this domain specific interface so that not only does it look great on screen, but if we had to build it a week later, we could and it would make sense." And so, a really large amount of research, and thought, and design effort went into designing the gesture language and then training the actors in that language. So Colin Farrell, Neil McDonough, and Tom Cruise all spent a while practicing. And so, when it came time, actually, to film, what we had was the opposite of the normal circumstance, which is where the director says, "Oh, just have the guy wave his arms around and the editor will have to sort it out and we will figure out what to do with it later."

It was extremely planned, and there was a cognitive logic behind everything. So Spielberg would say, "Okay. We want to get sequence here where he is looking at a pre-vision that shows an architectural detail out the window. That's how we are going to figure out that we are in Georgetown or Barnaby Woods, or whatever. And then we should have them pan down, and there are the bloody scissors, and blah, blah, blah." And so, kind of in real-time, the actors and I would translate that into the gesture language that had already been laid out, and that the actions were already, essentially, set with that. And then, you know, you dive out of the way, the cameras roll. And even though they are not seeing anything on the screen, as you said, it is going to be added in later, they know exactly what it is that they are doing. They know what the manipulations are, what they would be seeing, essentially. And so what you end up with is a sequence where you can see the cause and effect, and that's huge. You know, it's...I felt like we were really trying to respect the audience in a way that analogous sequences in other movies, that probably we shouldn't name, haven't. You know, let's show the audience something that really could exist. And, as it happens, now it does.

Because in another famous Philip K. Dick movie, Blade Runner, there is the whole really cool system where he scans the photo in and uses voices commands, but it doesn't quite match up with what he's saying and the pictures even look different. It doesn't feel as real as it does in Minority Report.

And you were trying to figure out how it worked. You know, he's been swilling scotch for the last 50 minutes or whatever, stumbling around going, "Track 14 to 35..." What does that mean, you know? And it is actually just a little bit cooler to let an audience figure out how it does work. Actually, I guess there is a lot of this stuff in Phil Dick adaptations. Like, Paycheck had another such sequences.

It's weird to think that Minority Report is now eight years old. Somehow it still feels fresh. Even now we are in a day and age where there's the iPad and iPhone, where you're using gestures on the screen. Microsoft is coming out with Project Natal for the Xbox, which is gesture-based, and Sony has announced its motion controller, the Move. We have the Wii, of course. Is it strange to you at all, as an engineer, that gaming is sort of driving that forward instead of computing, or does it seem sort of natural?

"The best and kind of most thoughtful work in UI design for the last 15 to 20 years has happened almost entirely in the domain of game designers."

No, not at all. From my point of view the best, and kind of most thoughtful, work in UI design for the last 15 to 20 years has happened almost entirely in the domain of game designers and game developers. And that is a circumstance that is not generally acknowledged. I mean there is stuff like the ACM SIGCHI conference where the people who are academics who work on this sort of thing all convene. And there is kind of a blind eye in those venues, if you will, turned toward the work that is happening in games, and yet that's where all of the really, really deep stuff is happening.

You know, how can a single person have six degrees of camera control and this and that and how can all that happen in real time and in such a way that the geometric surround is always evident to the player, etc., etc. There are very real problems being solved there that aren't being solved anywhere else, historically. But now it's really exciting to see people finally waking up. And you've rattled off the list of the major players and the usual suspects, and that's exactly right. That's great that we've finally gone, "Wait a minute. We don't have to be stuck with a mouse forever and drop down menus and scroll bars and the rest of it."

What is your opinion on those on those different systems?

Again, in general, it's great that real resources are being put against these ideas. Jeff Han's work with Perceptive Pixels and sort of the earliest really serious multi-touch contender. It's really gorgeous. If you haven't looked at those videos you probably ought to. And Microsoft Surface, they did a really nice job with that. And the gaming systems are all bringing something new to the gaming experience. I think what's going to be real interesting to see is what new kinds of interact modalities and game design features people cook up, because it's not going to be the initial developers; it's probably not going to be Sony and Microsoft and even Nintendo that bring out the killer app game. It's going to be someone else. I think it's the same problem with 3D in films in a way. It's a nice gimmick and then after that rush is over, it's going to take a while to figure out what additional idioms you build around this third dimension or this new way of seeing. And filmmakers are just starting to work on figuring that out. And I think it's going to be the same with more body-centered inputs for video games.

Sony's Move and the Wii both use a wand for input, which is somewhat foolproof. But when you're only using your hands we're wondering how the computer is going to distinguish between accidental input and what the user is actually intending. That's addressed in Minority Report when Tom Cruise goes to shake Colin Farrell's hand and he accidentally wipes all the windows off the screen. Was that your idea to stick that in to make it a little more realistic?

Yeah, it was a very nervous moment for me because it suddenly occurred to me that, "Wow. What happens if his concentration is interrupted and he's wearing the glove still and he goes to do something that's in the human domain instead of in the UI domain?" And so I bided my time and waited for a moment when Steven didn't seem to be entirely busy and approached him and said, "I don't know if this is of interest to you, but we could do a little gag where he goes to shake Colin's hand and it's sort of an autonomic response when someone extends their hand you shake it. And he basically destroys all the work he spent the last ten minutes doing." And it seemed to be an appealing idea so we shot it.

As someone who's designing these sort of systems, is that one of the biggest hurdles, making sure you recognize what is intended and not what's accidental?

"It's probably not going to be Sony and Microsoft and even Nintendo that bring out the killer app game. It's going to be someone else."

It's not the biggest hurdle but it is something we really do have to pay attention to. So we have a couple of ways of dealing with that. The easiest and sort of most straightforward one is a kind of a time out gesture. And depending on how we configure our system sometimes, it literally is that. You know, one hand on top of the other making a T just to tell the system "Stop listening. Whatever happens after this I'm not talking to you." We can also sort of condition the system so that if you take a step backwards or if your hands aren't kind of in the right general zone then it doesn't listen. I think those things tend to need to be really context specific. It's easy with a mouse, of course, because you take your hand off the mouse and the chunk of plastic sits there on the table and no one's moving it so it doesn't do anything. But it's not like you can take your hands off of your body.

If I was reading this correctly, after Minority Report came out that got people interested in the system that was used in the film and you were able to get funding for that and now it's actually become a reality now. You're selling the system to early adopters and you're working on it currently. Is that the case?

That's right. Everything you said is true. A few of us who are builders and engineers kind of by nature felt like we had to get back into the lab and build this stuff for a third time. We'd built it twice before; we built a version of it at MIT in an academic context. And then we built a slightly fictional version, but a version that was no less carefully designed, for the film. And we really believe in those ideas. We really think that in five to ten years that's how everyone's going to operate every desktop and every laptop machine. But not just that, you know, every toaster, every microwave oven, every car dashboard, your living room. Everywhere where there are computers, whether we call them that or not. It just has to happen. The mouse has run its course and enough is enough. So we pulled together a company, that's what Oblong is, around these ideas and with the explicit goals of developing all that technology and getting the world to a place where that makes sense. And it's the system we call called G-Speak.

G-Speak. So that I guess is short for gesture speak or your own version of that?

Well, it's gesture but it's also graphics and it's also geometry. A big part of the recognition is not so much about the gesture or language. That's an important piece, but it's not the only one. In fact, the biggest aha is that you need to build the system so that real world geometry is built in right down to the very lowest level of the software. So that the software acknowledges that the pixels are literally in the room with you. The screen is not just a collection of pixels at integer X and Y coordinates. Each pixel really has an X, Y, Z coordinate in the room and it's the same coordinate system that you share as a human and all the objects in the room. And all of a sudden, that connection opens up a huge number of new ideas and new possibilities.

I saw that you have been involved with some other films. Can you tell us what else you've worked on?

I think it was not long after Minority Report wrapped I did a little bit with a miniseries called Taken from the Sci-Fi channel. But the next big one was Hulk working with Ang Lee. The original Hulk, I suppose, it's important to say now, which was a fantastic experience and met my wife on that film. She and I worked together on the title sequence. And that was another kind of signature science consulting moment in films, such as they are. And there aren't a lot of them. But what was great is that Ang Lee is such a synthetic mind and for him, because the context of the characters was that most of them are scientists and science administrators, those details for him are no less important than the score or the costume design and so forth. Also, his wife is a microbiologist so it was another impetus to come at the thing with some verisimilitude. But he really got into it. And for him, all of the visual details and the kind of procedural details of the biology stuff that the film centers on are allusive and symbolic as well. And he loves finding ways that that stuff could resonate with other elements of his movie making process. The cell at the smallest scale looks, in some ways, like the galaxy at the larger scale and those kind of convergences work great for him.

I also worked on Aeon Flux with Karyn Kusama who's a really great director. Again, a really great thinker who's done a lot of interesting, early stage design work there. And of course bits and pieces on a bunch of other movies terminating with the UI in Iron Man.

Do you play any games you play when you're not working?

That's the problem, right? When you're not working. I often ask my son to do game research for me, which is a great excuse, from his point of view, to play games, but it's also really important for me to kind of be aware of what's out there. I love the kind of games that sort of invent a new genre as they go along. So the first time you play Katamari Damacy, you are like, "Wow! That is great." Pikmin had a little bit of that aspect to it. There's a really cool arc that at least exists in my mind that includes the game Rez. Do you know that one?

Oh, sure. Yeah.

What I love about that game is that it's so non-representational. Henry Jenkins, I think, has said that nothing has retarded the advance of video games and game design and gameplay so much as the drive to photorealism and representation. And Rez is so deeply weird and non-representational. It's kind of a stunning work in and of itself. But to me there's a dotted line that goes from there straight back to Tempest, because ... what are you? What is Tempest? It's hilarious to think about.

You mean conceptually? Wow, I don't know. You're looking down a giant tube and making sure things don't come up. Yeah, I've never heard anyone break it down that way.

"It doesn't have to be a Hollywood level narrative to make the thing completely engaging."

Exactly. You are a what? You're a sort of angular thing that kind of spiders around on the outside. And the great thing it doesn't matter. There's narrative enough in the geometry and the dynamics and the physics of the thing and the kinetics. And it's such a sort of obvious secret weapon thing, but I think it's mostly forgotten. There doesn't have to be this kind of traditional narrative. It doesn't have to be a Hollywood level narrative to make the thing completely engaging. So those are the kinds of games that really crack me up.

Well that's great. Thank you so much for taking the time and we look forward to getting our own hands on G-Speak and playing around with it. Thanks so much for your time.

This article was originally published on Joystiq.