Visual media has come a long way since the first proto-human cave dwellers used the flickering of torch light to bring the hand-drawn art on their walls to life. Today, the pixel — despite its humble, low-resolution origins — sits as the current pinnacle of digital display technology. In his new book, Biography of the Pixel, Pixar co-founder Alvy Ray Smith examines the fascinating history and development of picture elements (hence "pix"-"el") from their often-contested start in the labs of pioneering computer researchers like Alan Turing to their ubiquitous presence in modern life. In the excerpt below, Smith takes a look at the bad old days before digital displays to explain the science behind our brain's' ability to perceive motion through the rapid flashing of static images.
Excerpted from ‘A Biography of the Pixel’ by Alvy Ray Smith (MIT Press, 2021)
How Movies Were Really Done
What did the inventors of cinema do (or not) to make the system they gave us so non-ideal? First, they didn’t give us instantaneous samples as required by sampling. Film frames are fat. They have duration. The camera shutter is open for a short exposure time. A moving object moves during that short interval, and so smears slightly across the frame during the film exposure time. It’s like what happens when you try to take a long-exposure still photo of your child throwing a ball and his arm is just a blur. This turns out to be a saving grace of cinema as it was actually practiced.
Second, they made it so each frame is projected twice (at least) by the projector. Ouch! That’s not sampling at all. Why did the inventors do that? Simple economics demanded it: 24 frames per second costs half as much film as 48 frames per second. But the eye needs to be refreshed about 50 times per second, or the retinal image fades between frames. Actually, 48 is close enough to 50 to work in a dark theater. How do you get 48 from 24? You show each frame twice! If you show just 24 frames per second, the screen appears to flicker. Hence the “flicks” from the early days of cinema before higher frame rates were adopted.
The third thing the original inventors did was to shut off the light between projected frames. This meant that 48 times per second, nothing (blackness) was projected into the eye — inside the pupil, onto the retina. It’s convenient for movie machines — both the camera and projector part — to “shutter” to blackness like this between frames. It allows time for the mechanical advancement of the next film frame into position. In a camera, it keeps the film from recording the real world during the physical advancement of the film. In a projector, it keeps the moving film out of eyesight as it’s physically advanced.
When you ask how a movie projector works, some people say something like this: There’s a top reel of film which is the source of film, and a bottom take-up reel. The film moves from reel to reel and passes between the light source of the projector and its lens, which magnifies the frame-size image up to screen size. In other words, the film moves continuously past the light source. But that doesn’t work. The eye sees exactly what’s there, and with this scheme the eye would see one frame sliding away as the next frame slides in from the opposite side. It would see the sliding. And that won’t work.
What a projector actually does is exactly this: It brings each frame into fixed position with the light source blocked. That’s the function of the shutter. Then the shutter opens and the illuminated frame projects onto the screen. Then the shutter closes. Then it opens again, and the illuminated frame is projected a second time onto the screen. Then the shutter closes and the next frame slides into position, and so forth.
We’ve just described the discrete, or intermittent, movement of film through a projector, as opposed to unworkable continuous movement. The same idea holds for a camera. The physical device that implements this action is called, in fact, an intermittent movement. This is the key notion in cinema history that is comparable to the conditional branch instruction in computer history. The mad rush to the movie machine turned on who first got a projector to work correctly, and that hinged on who got an intermittent movement working properly. It’s a defining notion.
To recap: An actual film-based movie projector doesn’t reconstruct a continuous visual flow from the frame samples and present this to the eye. Instead, it sends “fat samples” — thick with time duration and smeared motion — directly to the eye’s retina. It sends each frame twice, and it sends blackness between. It’s up to the brain to reconstruct motion from these inputs. How does that work?
Somehow the eye-brain system “reconstructs the visual flow” that’s represented by the fat visual samples it receives. Of course, it really does no such thing. Light intensities come in through the pupil as input. But the output from the eye to the brain, through the optic nerve, is an electrochemical pulse train. Neuronal pulse trains aren’t visual flows. It could be that the retina actually does reconstruct a visual flow and then converts it to pulse trains for brain consumption. The responses of some of the neurons in the eye certainly suggest the spreader function, complete with a high positive hump and negative lobes. But brain activity is beyond the scope of this book. Let’s concentrate instead on the customary explanations of the perception of motion from sequences of still snapshots.
Perception of Motion
The classic explanation is hoary old persistence of vision. It’s a real characteristic of human vision: once an image stimulus to the retina ceases, we continue to perceive the image there for a short while. But persistence of vision explains only why you don’t see the blackness between frames in the case of film-based movies. If an actor or an animated character moves to a new position between frames then — by persistence of vision — you should see him in both positions: two Humphrey Bogarts, two Buzz Lightyears. In fact, your retinas do see both, one fading out as the other comes in—each frame is projected long enough to ensure this. That’s persistence of vision. But it doesn’t explain why you perceive one moving object, not two objects at different positions. What your brain does with the information from the retinas determines whether you perceive two Bogarts in two different positions or one Bogart moving between them.
Psychophysicists have performed experiments to determine the characteristics of another real brain phenomenon, called apparent motion. The experiments don’t explain how the brain perceives motion, but they do describe the limitations of the phenomenon. A small white dot on a black background is presented to a subject’s retina. Then that dot is removed, and another dot is presented in a different position. The experimenters can vary two things, the spatial separation of the two dots and the time delay between position change. The brain perceives one dot here and another dot there, but only if the distance and delays are long enough. If the distance and delays are short, the brain perceives that the dot moves from one position to the other. It’s apparent motion because no actual motion is presented to the eye. The brain perceives what it doesn’t see.
Persistence of vision is such that we still perceive the first image when the second one arrives. That sounds a lot like frame spreading. A frame of short duration spreads out in time and adds to the next frame also spread out in time. It’s as if the retina does the image spreading and the adding of successive spread frames. Something like this must be going on because we perceive a continuous visual field although the film projector doesn’t present one. You can think of the shape of the persistence function of the eye as the shape of the frame spreader that’s built into us human perceivers. Another reason we can assume that the eye-brain system must be doing a reconstruction, one that implicitly uses the Sampling Theorem, is because we perceive exactly the errors we would expect if that were the actual mechanism—such as wagon wheels spinning backward.
Classic cel animation — of the old ink-on-celluloid variety — relies on the apparent-motion phenomenon. The old animators knew intuitively how to keep the successive frames of a movement inside its “not too far, not too slow” boundaries. If they needed to exceed those limits, they had tricks to help us perceive the motion. They drew actual speed lines, which showed the brain the direction of motion and implied that it was fast, like a blur. Or they provided a POOF of dust to mark the rapid descent of Wile E. Coyote as he stepped unexpectedly off a mesa in hot pursuit of that truly wily Road Runner. They provided a visual language that the brain could interpret.
Exceed the apparent motion limits — without those animators’ tricks — and the results are ugly. You may have seen old school stop-motion animations — such as Ray Harryhausen’s classic sword-fighting skeletons in Jason and the Argonauts (1963) — that are plagued by an unpleasant jerking motion of the characters. You’re seeing double, at least — several edges of a skeleton at the same time — and correctly interpret it as motion, but painfully so. The edges stutter, or “judder,” or “strobe” across the screen. Those words reflect the pain inflicted by staccato motion.
Live-action movies are sequences of discrete frames just like animations. Why don’t these movies stutter? (Imagine directing Uma Thurman to stay within “not too far, not too slow” limits.) There’s a general explanation that works. It’s called motion blur, and it’s simple and pretty. A frame that’s recorded by a real movie camera is fat with duration. It’s not a sample at a single instant like a Road Runner or a Harryhausen frame. Motion blur is what you see in a still photograph when the subject moves and the shutter isn’t fast enough to stop the motion. In still photographs, it’s often an unintended result, but it turns out to be a feature in movies. Without the blur all movies would look as jerky as Harryhausen’s skeletons—unless Uma miraculously stayed within limits. The motion blur of moving objects in a fat frame gives clues to the brain about what is moving and what is not. The direction of a blur gives the direction of motion, and its length indicates the speed. Somehow, mysteriously, the brain converts that spatial information — the blurs — into temporal information and then perceives motion with the help of the apparent motion phenomenon.