Adventures in Motion Capture: Using Kinect Data (Part 3)

July 15, 2016

Having already covered the basics of recording joint information with the Kinect v2 sensor and applying that data to a custom 3D character model for animation, this week I’m going to wrap things up with a few last tweaks. In particular, I’ll go through using the Kinect’s floor data to ensure characters are standing straight and the Kinect’s positional data to allow for animations that involve vertical movement, such as jumping.

Tilted Skeletons

When the Kinect sensor records its surroundings it uses itself to define the origin of the three dimensional space being recorded. Within this space we have our standard X, Y, and Z axes. To best understand these, imagine you are a Kinect sensor looking into the world.

[Camera space with the Kinect parallel to the floor.]
Camera space with the Kinect parallel to the floor.

  • The X axis (not shown above) increases to your left and decreases to right.
  • The Y axis increases going up and decreases going down.
  • The Z axis increases in front of you and decreases behind you.

All well and good, but as I mentioned, these axes are defined with the Kinect as the origin. Now imagine what happens if you raise and lower your head.

[Camera space with the Kinect pointing downwards.  Note how the model head is closer on the Z axis than the feet.]
Camera space with the Kinect pointing downwards. Note how the model head is closer on the Z axis than the feet.

Here we have the Kinect sensor tilted so that it’s pointing slightly downwards. The sensor doesn’t account for that though. It still considers going straight away from it to be the Z axis and going up from its top to be the Y axis. As shown, these directions no longer match with the true Y and Z. The result of this is that if we apply the raw Kinect orientation information to our model then the model will be leaning forwards if the sensor is pointed down and backwards if the sensor is pointed up.

The easy way to correct for this tilt is to simply ensure that the Kinect sensor is adjusted so as to be level with the surface the person being recorded is standing on. However, the Kinect also provides us with data that we can use to programmatically account for this tilt.

From the Kinect 2.0 SDK we previously used the Body Basics sample application to record orientation data from the sensor. This was done using the BodyFrame class, which provides access to both the position and orientation data of joints. It also provides information about the floor a person is standing on if the floor is in the field of view of the camera.

Each frame that data for a body is retrieved from the sensor, the BodyFrame method get_FloorClipPlane can be used to retrieve a Vector4 that defines the plane the bodies are standing on. The general equation of the plane is defined as:

Ax + By + Cz + D = 0

In our case, the Kinect sensor returns the values for A, B, C, and D in that Vector4. These values are returned in “Hessian normal form” which is a fancy way to say that the values have been scaled such that A, B, and C define a unit length vector that is the plane’s normal. It’s this unit vector that we’re most interested in.

[From the point of view of the camera the floor normal is pointing uppish and back instead of just up.]
From the point of view of the camera the floor normal is pointing uppish and back instead of just up.

If we consider that the subject being recorded by the sensor is standing on the floor (and the floor is more or less level), then the person will be standing up and down and the normal to the floor will also be up and down. If we then introduce the sensor into the equation, we can see that both the normal and person will be pointed towards or away from the camera by the same amount. From the point of view of our camera there is some angle between where the floor normal is and where a true straight up Y vector is. We therefore need to rotate each of the orientations we receive from the Kinect by that angle to bring them into true.

To do this we need to define two vectors in our code. The first vector is just a basic “up” vector defined as (0, 1, 0). The second is the floorNormal vector which is of unit length and is obtained by taking the first three components from the Vector4 returned by get_FloorClipPlane.

Important: The Kinect sensor can only return the floor plane if it can actually see the floor in its field of view. If the sensor is tilted up, or used in a really confined space, it may not be able to perceive the floor. In this case, the values of the Vector4 are all returned as 0. Just something to test for to ensure things don’t go sproing.

Frist we calculate both the cross product and the dot product of these two vectors. The cross product gives us the axis that we need to rotate about to bring the floorNormal vector to be pointing straight up like the up vector and the dot product gives us the amount of rotation that we need.

axis = up X floorNormal

dot = up . floorNormal

To get the angle of rotation, we take the arc cosine of the dot. From that angle and the axis we already calculated, we can define a quaternion that we can use to rotate the Kinect orientations to eliminate model tilt.

angle = RadToDeg(ArcCos(dot)) Note that we convert from radians to degrees here for the angle.

floorQuat = Quaternion(axis, angle)

By applying this rotation to all of the Kinect orientations we receive we rotate the skeleton information such that it’s now standing up and down from the point of view of the sensor. This gives us the effect were after; namely, it removes the tilt from the skeleton.

One final note: You may recall from last time that in my particular case, because I was using MilkShape 3D to create my test model, I had to rotate the Kinect orientations such that the Y axis was mapped onto the Z axis. When applying correction for the floor tilt, that correction should be applied after the YZ swap.


Up to this point we’ve only concerned ourselves with the orientation data from the Kinect as that’s what we need to pose the skeleton of our 3D model correctly. Given differences in size and shape, we can’t really use the joint position data the Kinect provides, except for one in one instance: the SpineBase

[The Kinect v2 joint hierarchy.]
The Kinect v2 joint hierarchy.

The SpineBase joint is the ultimate ancestor of every other joint in the skeleton. By moving this one point about we automatically move all the other points in the skeleton by the same amount. This is especially useful for allowing us to do animations where the body travels vertically, such as when the person we’re recording jumps.

Right now, we’re not doing anything with the position of SpineBase in our test model. Spine base is our starting point and all we do is apply its rotation to any vertices related to it. This allows those vertices to twist and turn in space but doesn’t otherwise allow for them to move up and down. If we were to apply a jumping animation at this point, we’d see the body move as if it was jumping but it wouldn’t actually change height. Let’s address that, shall we?

First, revisiting our SDK sample application, we’ve already used the GetJointOrientations method of the Body class (which we in turn get from the BodyFrame class) to retrieve the orientations of the joints. Now we’re also going to use the GetJoints method to get the positions of the joints themselves. The joints are returned to us in a list of Joint objects, one for each joint the Kinect tracks, the same as for the orientations. For each joint we can access the Position property and get its X, Y, and Z sub properties.

Even though we get the positions for all of the joints, we’re really only interested in the SpineBase position. From that we can tell how much the skeleton as a whole is moving up and down. What we need to do is apply that movement to our test model skeleton in a reasonable way. We can’t just apply it directly. For example, suppose we record a really tall person but we apply the animation to a really short model. If we used the change in position directly it would look like our short model had a super jump. What we want to do is apply the change in position of the recorded subject proportional to the height of our actual test model.

There are a number of ways to do this. For my playing around, I went the simple route. I assume that the person being recorded starts in a more or less neutral standing position. A position where their SpineBase point is the regular distance from the floor the person is standing on.

The first goal is to determine how far away from the floor plane the initial SpineBase is. Yep, the same floor plane we used above to straighten out the tilt in the model from tilt in the camera.

By taking the position of the SpineBase and substituting it into the floor plane equation we get how far away from the plane the SpineBase position is.

distance = Ax + By + Cz + D

Here, A, B, C, and D come from our floor plane and x, y, and z are from the SpineBase position. We hold onto this distance and, because we’re assuming the person we’re recording is starting standing in a neutral pose, we say that this distance is equivalent to the distance from the origin to SpineBase in our 3D test model.

[Applying a proportional difference in the SpineBase move allows the model to jump.]
Applying a proportional difference in the SpineBase move allows the model to jump.

On subsequent frames in our recorded animation we run the SpineBase position through the plane equation to figure out how far away from the floor plane the skeleton currently is in Kinect space. When the person jumps up, the distance will increase from the initial distance we stored. When the person ducks, the distance will decrease. In either case, we can calculate the percentage of their initial height like so:

percentHeight = currentHeight / initialHeight

Suppose the initial SpineBase height is 1 meter. On a later frame, the person then ducks so their SpineBase is at 50 centimeters (half a meter). Our calculation above will give us a percentHeight of 50 / 100 = 0.5. All we have to do is multiply the initial SpineBase position of our 3D test model by 0.5 and our model will move down by half of its SpineBase height regardless of the height of the actual person we’re recording.

And that’s it. By applying proportional changes in the height of the SpineBase joint instead of raw changes in height, our test model will move vertically in the same way as the person being recorded but at a scale appropriate for the test model. If we wanted to, we could apply a similar technique to the X and Z movements of our test model. This could be useful for having the model move along at an appropriate rate where the person we’re recording is walking or running, for example.

Limb Rotations

One last topic I wanted to briefly touch on is that of limb rotations. I’ve left it for the end because it’s something I haven’t dealt with yet in my own experiments with the Kinect data. However, if you’re playing with Kinect data you’ve probably already encountered the issue of bad limb rotations.

The Kinect sensor works quite well for recording body information, but it’s not perfect. Arms and legs can cause the Kinect sensor some problems. Especially the forearms.

If you stick your arms straight out in front of you, palms pointed in, and then rotate them so they’re still straight out but your palms are pointed outwards, you’ll see that there’s really very little physical difference in how your arms (ignoring your hands) appear. These type of bone aligned rotations can cause the Kinect issues where from one frame to the next the Kinect may decide that a person’s arm has suddenly rotated 180 degrees and then on a subsequent frame it may rotate back again.

I haven’t done anything yet to try to filter out these bad rotations so I have no real recommendations on how to handle this problem. Just be aware that if you’re seeing this happen with your Kinect data that it is more than likely coming from the data from the Kinect itself and is not a mistake in any code you may have written.


It took a while to get through all the issues – three blog posts to be exact – but we now have motion capture data recorded from real people using the Kinect that we can reasonably apply to our virtual 3D models. A few issues remain (mostly just the bad limb rotations) and doubtless many improvements could be made but hopefully this miniseries will allow you to have some fun with your Kinect. Good luck!