laitimes

Sora's explosive "Balloon Man" video is revealed behind the scenes: it is well-made but there are copyright and consistency concerns

author:Not bald programmer
Sora's explosive "Balloon Man" video is revealed behind the scenes: it is well-made but there are copyright and consistency concerns

OpenAI's video-generating tool, Sora, surprised the industry in February, with smooth, lifelike videos that seem to be way ahead of its competitors. However, this well-planned debut left the public with too many unknown details.

Recently, Shy Kids, one of the production teams of OpenAI's popular promotional video, was interviewed by the media and shared his ups and downs as one of the few video creators who first used Sora's technology.

Sora's explosive "Balloon Man" video is revealed behind the scenes: it is well-made but there are copyright and consistency concerns

OpenAI Sora宣传短片之一(Air Head 气球人)

Shy Kids, a Toronto-based digital production team that was selected by OpenAI as one of the few teams to produce short films primarily for OpenAI's promotional purposes, was given considerable creative freedom when creating "Air Head."

And it's worth noting that the short films weren't entirely generated by Sora, and in an interview with the media, post-production artist Patrick Cederberg described "actually using Sora" as "it's just part of his work."

However, the public may have a preconceived notion that these realistic and vivid short films were made entirely by Sora.

However, the truth is that these are professionally produced, complete with powerful storyboarding, editing, color correction, and post-production work such as motion observation and VFX.

In the same way that Apple advertises "shoot on an iPhone" but doesn't show the studio setup, professional lighting, and color work afterwards, Sora's post only talks about what it lets people do, not how they actually do it.

Cederberg's interview was interesting and very non-technical. As impressive as the Sora model is, it may not be as rosy as we think.

Sora's explosive "Balloon Man" video is revealed behind the scenes: it is well-made but there are copyright and consistency concerns

There is no proper set of features to fully control consistency

At the moment, control remains the most desirable and elusive. ...... The closest we can get is over-descriptive in the prompt. Explaining the character's costume as well as the type of balloon is our approach to consistency, because shot-by-shot/generation-by-generation, there is no proper feature set to fully control consistency.

In other words, simple problems in traditional filmmaking, such as choosing the color of a character's costume, require a more complex solution to Sora, and need to be checked in the generative system, as each shot is created independently of the others. This situation may improve in the future, but it is certainly still very laborious for now.

In addition, it is important to note whether Sora's output has superfluous elements: Cederberg describes how the model generates a face (the head of the protagonist) on a balloon, or a rope hanging in front. If they can't remove these faces or ropes with a hint, then they have to be deleted through post-production, which is another time-consuming process.

Sora's explosive "Balloon Man" video is revealed behind the scenes: it is well-made but there are copyright and consistency concerns

The precise timing and control of the actions of the people or the camera, which is also practically impossible: "There is a little bit of time control over where these different actions are happening in the actual generation, but it's not precise...... It's a bit like shooting in the dark," said Cederberg.

For example, timing gestures like waving hands is a very approximate, "suggestion-driven" process, unlike manual animation. Shots like the character's body panning upwards don't always deliver what the filmmaker wants. In this case, the team had to render a shot of portrait composition themselves and crop and pan it in post. The resulting clips are also often in slow motion, for no particular reason.

Sora's explosive "Balloon Man" video is revealed behind the scenes: it is well-made but there are copyright and consistency concerns

A shot of Sora and how it was generated in the short film

In fact, Cederberg says, the team found it very surprising that the team found that the use of everyday language of filmmaking, such as "pan to the right" or "follow the shot," was generally inconsistent.

"Researchers didn't really think like filmmakers before they let artists use this tool," he said.

As a result, the team ran hundreds of builds, each lasting 10 to 20 seconds, and ended up using only a handful of them. Cederberg estimates the ratio at 300:1 — but of course, we're all probably surprised by the ratio of a normal shot.

In addition, the team actually made some behind-the-scenes videos to explain some of the issues they were experiencing. Like much AI-related content, the comments are quite critical of the entire effort – though not as invective as the AI-assisted ads we've seen recently.

Sora's explosive "Balloon Man" video is revealed behind the scenes: it is well-made but there are copyright and consistency concerns

Sora has some mechanism to reject the generation of alleged copyright issues

The last interesting issue has to do with copyright: if you ask Sora to give you a "Star Wars" clip, it will refuse. If you try to get around it with "robed man with a laser sword on a retro-futuristic spaceship", it will also refuse because it recognizes what you want to do through some mechanism. It also refused to carry out "Aronofsky-style lenses" or "Hitchcock zoom".

On the one hand, it makes perfect sense. But it does raise the question: if Sora knows what these are, does that mean the model was trained on that content and can better identify whether it's infringing or not? OpenAI keeps its training data card a ridiculous secret, just as CTO Mira Murati did in an interview with Joanna Stern, which will almost certainly never tell us.

As for Sora and its use in filmmaking, it's obviously a powerful and useful tool, but what it does is not "make a movie out of a whole piece of cloth." "That will come later. ”

Read on