How Microsoft’s Story Remix does what Clippy couldn't

Deep learning makes all the difference.


Microsoft is making some bold promises with Story Remix, its recently announced app for the Windows 10 Fall Creators update. Together with the company's deep learning technology, it can automatically craft your photos and videos into short films. Story Remix resembles Apple Clips and Google's Photo Assistant, but it goes a bit further with the ability to analyze everything on a pixel level-basis to detect people, objects and the overall setting.

If it works as advertised, it could be a transformational app for consumers fed up with their ever-growing libraries of digital media. It's the latest attempt by Microsoft to make your life easier by predicting what you want. But, as you'll recall, that hasn't always worked out well for the company.

"Clippy was actually a very early attempt at trying to look at patterns of what you're doing, but it didn't have the benefit of all this deep learning tech that's available now," said Microsoft Corporate Vice President Chris Pratley, the man behind Story Remix, in an interview with Engadget. With his previous project, Sway, he tried to rethink how you could create text- and media-based stories without worrying about formatting. Story Remix is the next logical step.

"The problem always holding us back is that we didn't understand what you were doing," he added. "If I were a human being, I could see if you're writing an article, or something like that. But we didn't know that. Everything was just text and pixels."

Now, Microsoft can actually figure out what those pixels mean with its Compute Vision AI technology. That makes it easier to add visual effects, like placing a fireball animation onto a soccer ball as it's about to be kicked. You just have to drop a 3D effect onto the soccer ball and letting Story Remix do the mapping work.

"Anyone now can take off the shelf deep learning tech and see there's a face [in a photo]," Pratley said. "It's a lot more work to tell you whose face it is, and to distinguish it from another face." It's one thing to do that for an image, but Story Remix also has to be able to analyze every frame of a video, up to 60 frames per second. That lets it add special effects that would typically require a green screen, like placing a volcano in the background of a video.

"Ten years ago, with the way any creative tool was built, the basic tools didn't have a lot of features. And the really powerful ones had tons of features," Pratley said. "The model was, you had to bring your own time, talent and patience to [every project], and you got out of it what you put into it. It sounds great, but it really isn't."

In many ways, Story Remix looks a lot like the future of productivity apps. It's trying to cater to users of all sorts, from beginners who want to put in the least amount of effort into creating movies to power users who want to spend a bit more time to get things just right. While there will surely be users who clamor for even more manual control in Story Remix, Pratley isn't too worried about that. He points out that adding manual controls is easy, the hard part is going further to make complex productivity tools simpler for everyone.

Of course, there are reasons to be skeptical of this push towards deep learning-enhanced productivity apps. Although the technology has made for some great demos, it really hasn't transformed consumer apps just yet. And, Microsoft will surely have a hard time living down the specter of Clippy. But, if Story Remix succeeds, it could be huge.

"It's always been my dream as a creative tool builder, if I just knew what you were building, I could help you so much more," Pratley said. "And now we do."

Click here to catch up on the latest news from Microsoft's Build 2017.