Meta patent: full body posture prediction calculation using machine learning + inverse kinematics

(Nweon, March 19, 2022) For Hollywood-style precision full-body motion capture technology, the associated setup costs are quite high and complicated. Although Andrew Bosworth, the associate chief technology officer of Meta who oversees the Reality Labs business group, previously said that Quest 2's current inside-out tracking cannot support full-body motion capture and will be more difficult to achieve in the future, the company is still actively exploring and hopes to develop full-body motion capture technology for consumer headsets as soon as possible.

Recently, the United States Patent and Trademark Office published a Meta patent application called "Systems and methods for predicting elbow joint poses" and "Systems and methods for predicting lower body poses". The principles of both inventions, like the same principle, are the use of machine learning models and inverse kinematics to predict joints and infer full-body posture.

Since there is a certain correlation correspondence between the joint muscles during human movement, the system can infer other joints based on the inverse kinematics skeletal model by predicting the indecision of one joint posture, and then integrate them into a complete body posture.

In simple terms, for elbow posture prediction, the device may determine the head posture and capture the forearm / wrist image through the camera, and then by the trained machine learning model based on the information and the forearm / wrist and elbow muscle joint correspondence to predict the inferred elbow posture.

For full-body posture prediction, the upper body posture can first be integrated by the aforementioned method and inferred from the prediction of the head and forearm/wrist capture images, and then the trained machine learning model can predict and infer the entire lower body posture based on the upper body posture. Finally, the system integrates the posture of the upper and lower body.

Meta patent: full body posture prediction calculation using machine learning + inverse kinematics

FIG 2 is an exemplary body posture associated with the user 102. Wherein, the computing system 108 may generate a body posture associated with the user 102 200. Body posture 200 includes an inverse kinematic skeletal frame, the latter may include a list of one or more joints. In a particular embodiment, the body posture includes a joint posture associated with the user 102, such as but not limited to head posture 210, wrist posture 220, elbow posture 220, shoulder posture 240, neck posture 250, upper spine posture 260, lower spine posture 270, hip posture 280, knee posture 290 or ankle posture 295.

1. Elbow and upper body position

Figure 3A shows an image of an arm captured by the headset camera within its limited field of view. Wherein, 102 is the user's left arm, 106 is the controller held by the left arm.

In Figure 3B, the system can utilize techniques such as Mask R-CNN to generate a segmentation mask and divide multiple regions. The segmentation mask can be represented as a two-dimensional matrix, each element of which corresponds to a pixel in the input image.

In one embodiment, the system may utilize a nonlinear kinematics optimization solver to infer one or more joint postures. For example, a nonlinear solver can use the skeleton solver function to infer a single frame of inverse kinematics (a single body pose).

In a particular embodiment, the nonlinear solver may use parameters and predetermined static weights to infer the body pose 200, which infers the most likely posture of the joints of the user 102 at a particular time or state. The inferred body posture may include the posture of one or more joints of the user 102, such as the elbow joint 230.

In a particular embodiment, the image data and segmentation mask may be used to evaluate the accuracy of one or more intermediate joint poses inferred by the nonlinear solver, and then update the nonlinear solver to more accurately predict one or more joint poses in subsequent iterations. For example, a nonlinear solver may receive one or more inputs and infer the intermediate posture of one or more joints, such as the elbow posture.

Using the principles described above, the system can combine elbow posture and infer the integration of upper body posture based on head posture and forearm/wrist image information.

2. Lower body bust position

As mentioned earlier, there is a certain correlation correspondence between the human joint muscles, so after inferring the integration of the upper body posture through the above method, the trained machine learning model can further infer the integrated upper body posture as input based on the correlation correspondence between the upper body and the lower body, and then infer the lower body posture including the legs. Finally, the system can integrate the upper and lower body postures.

As shown in FIG. 3, in a particular embodiment, the upper body posture 205 may be processed by using a trained machine learning model, and the lower body posture 215 may be generated according to the correspondence. Fig. 3 shows the use of the input of the upper body posture to generate the lower body posture. Specifically, the trained machine learning model 300 may be based on a generative adversarial network (GAN) and utilize the upper body posture 205 to generate the lower body posture 215. Then, the computing system may combine the generated upper body posture 205 with the generated lower body posture 215.

FIG 4 shows a configuration for training to generate an adversarial network (GAN) 400 for pose prediction. GANs can include two independent neural networks, a Generator 405 ("G") and a Discriminator 410 ("D"). In a particular embodiment, Generator405 and Discriminator410 may be implemented as neural networks.

Generator405 may be configured to receive the generated upper body posture 205 as an input, and output the generated lower body posture 215. In a particular embodiment, the upper body posture 205 may be combined with the lower body posture 215 to generate a full body posture 425. In a particular embodiment, Discriminator410 may be configured to distinguish between the "false" full body posture 425 generated by generator405 inference and the "true" training posture 435 from the training posture database 440. In a particular embodiment, one or more training full body postures 435 may include full body poses from one or more images.

Generator405 and Discriminator410 can be considered rivals because the goal of Generator405 is to generate false poses that can deceive Discriminator 410 (in other words, increase the error rate of Discriminator 410), while the goal of Discriminator 410 is to correctly distinguish between "fake" poses and "real" poses.

With a trained GAN, the system can use the upper body posture to infer the lower body posture.

相关专利：Facebook Patent | Systems and methods for predicting elbow joint poses

相关专利：Facebook Patent | Systems and methods for predicting lower body poses

Meta patent applications titled "Systems and methods for predicting elbow joint poses" and "Systems and methods for predicting lower body poses" were filed in September 2020 and published by the United States Patent and Trademark Office. It is important to note that this is only a patent application and the specific effect is uncertain, especially since this is a method of inference prediction.