We Can All Be Video Game Characters With This AI

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. The title of this paper is very descriptive,
it says Controllable Characters Extracted from Real-World Videos. This sounds a little like science fiction,
so let’s pick this apart. If we forget about the controllable part,
we get something that you’ve seen in this series many times – pose estimation. Pose estimation means that we have a human
character in an image or a video, have a computer program look at it, and tell us the current
position this character is taking. This is useful for medical applications, such
as detecting issues with motor functionality, fall detection, or we can also use it for
motion capture for our video games and blockbuster movies. So just performing the pose estimation part
is a great invention, but relatively old news. So what’s really new here? Why is this work interesting? How does it go beyond pose estimation? Well, as a hint, the title contains an additional
word, “controllable”, so, look at this! Woo-hoo! As you see, this technique is not only able
to identify where a character is, but we can grab a controller, and move it around! This means making this character perform novel
actions, and showing it from novel views. That’s really remarkable, because this requires
a proper understanding of the video we’re watching. And this means that we can not only watch
these real-world videos, as you see this small piece of footage used for the learning, but
by performing these actions with a controller, we can make a video game out of it. Especially given that here, the background
has also been changed. To achieve this, this work contains two key
elements. Element number one is the pose2pose network
that takes an input posture and the button we pushed on the controller, and creates the
next step of the animation. And then, element number two, the pose2frame
architecture then blends this new animation step into an already existing image. The neural network that performs this is trained
in a way where it is encouraged to create these character masks in a way that is continuous
and doesn’t contain jarring jumps between the individual frames, leading to smooth and
believable movements. Now, clearly, anyone who takes a cursory look
sees that the animations are not perfect and still contain artifacts, but just imagine
that this paper is among the first introductory works on this problem. Imagine what we’ll have two more papers
down the line. I can’t wait. Thanks for watching and for your generous
support, and I’ll see you next time!

