Adventures in Motion Capture: Using Kinect Data (Part 1)

July 1, 2016

Three weeks ago I wrote about my travails with setting up the Microsoft Kinect v2 sensor to record motion capture for my own game development needs. At the time, I mentioned I’d follow up with my experiences using the different motion capture software packages that are currently available. Instead, I’m back this week to write about my own spelunking into the data received directly from the Kinect itself: how to get that data and how to apply it to my own models.

So why the change in direction? Two reasons.

First, I’m a computer guy and I love playing around with new bits of kit. I became fascinated by the Kinect software development kit and wanted to have some fun with the data from the Kinect sensor myself.

Second, I did start to look at a number of motion capture packages out there and for various reasons found I either couldn’t use them or was unwilling to. One important factor was that after the difficulties I had getting the hardware set up, I wanted to make sure that any software I used would indeed work well. So before plunking down any money, I naturally wanted to give a demo/trial a go. Anyway, here’s a quick rundown of the ones I looked at:

  • Brekel – Brekel’s trial version only allows for four seconds of recording time. I passed this one by as, given how my computer and Kinect sensor are currently positioned, I didn’t think I’d be able to start the recording and then get into position in time to actually capture any data. Ten seconds probably would have done it, but not four.
  • nuiCapture – nuiCapture Animate has a thirty second record limit so I thought this one would work out for me. However, the software would not run on Windows 10, erroring out immediately. After a bit of further research, I found this software was released prior to the Kinect v2 sensor, so even if the software had run on Win 10, it’s unlikely it would have worked with the v2 sensor (having been developed for the Kinect 360 sensor instead).
  • NI Mate – I gave the trial version for NI Mate a try and indeed it could use the Kinect sensor and bring data into it. However, I discovered the trial version doesn’t allow for saving data out to a file. Without the ability to save to a file I was unable to test the pipeline of recording data, applying it to my test model’s skeleton, and then trying it in game. Without “proof of pipeline” this was another application I had to pass by.
  • Fastmocap – Quickly passed this one by as (a) it only supports the Kinect 360 sensor not the Kinect v2 sensor and (b) they oddly don’t have a trial version due to “some security reasons” (which sounds more than a little dodgy).
  • Kinect BVH – Free motion capture software but it only works with the Kinect 360 sensor not the v2. Another pass.
  • iPi Soft – I’ve read good things about this software but I decided to pass on giving it a try at this time. This software offers a 30 day trial after which it switches to a subscription pricing model. Now, I’m OK with plunking down a couple of hundred dollars for a good piece of software. I’m less OK with having to do that every year. I’m leaving this open as an option to look at in the future, but I’m passing over it for now (got to save that trial period for when I better know the requirements of what I’ll need for my game).

Skeletal Animation Basics

A little bit of preamble before we get to actually using data from the Kinect sensor. This is basic 3D animating stuff. If you’re looking into motion capture tools you’re probably already familiar with it, but if not…

[A 3D model from Robyn HUD (WIP).]
A 3D model from Robyn HUD (WIP).

To start with, we have a 3D model of some sort that we want to animate. That model is made up of a bunch of vertices, or individual points in space that when connected give us a basic representation of what the model looks like. To animate the model we need to move where all those points in space are. For example, if we want to have a character holds its arm out in front of it, we swing up all the vertices that make up that arm.

[All vertices related to the right arm rotate about the right shoulder.]
All vertices related to the right arm rotate about the right shoulder.

It would be a lot of very repetitive work to figure out how to move each vertex individually. However, we can simplify the amount of work needed by recognizing that vertices can be moved in batches. For example, when the bone of a real person’s upper arm moves, all the muscle and skin and goopy insides surrounding that bone move in the same way. Specifically, all of that material rotates through space as the person’s shoulder rotates.

It’s this idea of rotation that leads into skeletal animation. For our 3D model we can create a simple “skeleton” by defining points at each of the major places that can bend (rotate), like so:

[A skeleton can be defined for a 3D model to reduce the number of points we have to worry about.]
A skeleton can be defined for a 3D model to reduce the number of points we have to worry about.

Those points represent the joints of our model’s skeletons. As a given joint rotates, we can rotate all the vertices related to it around that joint. We can also rotate the position of any joint attached to that joint. So if we rotate the shoulder of a model so that the upper arm is pointing forward, we can also apply the same kind of rotation to the lower arm so that it follows along. We can then add on an additional rotation to the lower arm to allow it to point up.

[We just need to rotate the key joints and let the computer figure out the math for the vertices.]
We just need to rotate the key joints and let the computer figure out the math for the vertices.

Basically, if we have a list of rotations that we can apply to the different joints in a skeleton then we can cause a 3D model to move into whatever pose we require of it.

The Kinect Skeleton

The Kinect sensor is essentially a fancy camera that is able to “see” depth information. When a person walks into the view of the Kinect sensor, internally the sensor determines a basic skeleton for that person similar to how we could create a skeleton for a 3D character model as described above.

The closer our 3D character model’s skeleton resembles the Kinect’s skeleton, in terms of the joints used and how they’re connected to one another, the better our animation results will be. The Kinect v2 skeleton consists of 25 joints. Each joint is assigned a specific number to identify it. The Kinect skeleton is arranged as follows:

[The Kinect v2 joint hierarchy.]
The Kinect v2 joint hierarchy.

It’s important to know the hierarchy of the joints. The ultimate parent of all joints is joint 0 SpineBase. Child joints spread outwards from the SpineBase joint. For example SpineMid, HipLeft and HipRight are all children of SpineBase. This will become important when we process the Kinect data in order to apply it to our own 3D models.

Important: As mentioned, the v2 sensor records data for 25 joints. The older Kinect v1 sensor (the one that is used with the Xbox 360) records fewer joints. This is why SpineShoulder (joint 20) seems to be out of place as far as its index number goes, because it was added into the skeleton for the v2 sensor and is not present in the v1 sensor’s skeleton.

The Kinect SDK Samples

One of the requirements of setting up the Kinect v2 sensor on a Windows machine is that the Kinect 2.0 SDK must be downloaded and installed. The fun thing about this is that the SDK includes a bunch of simple sample applications that highlight the different features of the Kinect. Typically these samples involve gathering some subset of the data from the Kinect and displaying it on screen. Among the samples there are applications for displaying the raw depth data the sensor sees, the joint and bone positions of people standing in front of the sensor, the regular video camera data from the sensor, facial recognition, and so on.

I was most interested in the joint and bone position data as that’s what’s needed to animate a 3D character. The SDK sample that captures and displays this data (in the form of an animated stick figure on screen) is the Body Basics application.

I won’t go through the Body Basics application line by line as that’s what the SDK sample code is for. However, in summary the sample application does the following:

  • Connects to the Kinect sensor.
  • Retrieves frames of 3D body data from the Kinect sensor.
  • Converts the 3D body data into 2D monitor screen coordinates.
  • Draws a stick figure skeleton using the converted 2D points.
  • Closes the Kinect sensor when the application ends.

The deceptive thing with this sample application comes when converting the 3D data to 2D to be represented on screen. The Kinect tracks the joints of a person (knees, shoulders, elbows, etc.) as positions in three dimensional space and as rotations that indicate how those joints are bent. The sample application uses the positions of the joints as opposed to their rotations. This is fine if all we want to do is what the sample does, drawing where the joints are and then connecting the dots with lines.

However, as far as animation goes, what we really want is the rotation information. As I described in the Skeletal Animation Basics, what we’re after is a list of rotations that we can apply to each joint in order to pose our own 3D model. In this case, that list of orientations will be the orientations for each joint on the person the Kinect is observing.

Getting the joint orientations is actually very easy. The sample application retrieves an instance of the BodyFrame class for each frame recorded from the Kinect sensor. The BodyFrame class contains a getAndRefreshBodyData method which is used to retrieve a list of all the bodies in view of the camera for that frame. The Kinect automatically tracks up to six bodies in view, although it only tracks the skeleton data for two of those bodies.

Once the sample application has retrieved the list of bodies, which are instances of the Body class, it loops through all of them looking for the ones where the isTracked property is true. These are the bodies that skeleton information is available for.

At this point, the sample application then uses the joints property of the body it’s processing to determine the positions in 3D space of each joint. Those positions are mapped to 2D screen coordinates and a pseudo-skeleton is drawn for the body.

As mentioned, we want the rotation information, not the position information. The rotation data is also available to us off of Body objects we get from the sensor. Instead of accessing the joints property, we want to access the jointOrientations property.

The jointOrientations property is a list of the 25 joints the Kinect v2 sensor tracks. Each element in the list has two properties of its own:

  • jointType – The number of the joint as shown in the Kinect Skeleton section above.
  • orientation – A vector4 that is a quaternion representation of how the joint is rotated (quaternion’s are beyond the scope of this blog, suffice to say they’re a compact and easy way to represent and work with rotations in three dimensional space).

My first step in working with the Kinect data was to just get the basic orientation data written out to a file. I started with the original SDK example, but changed it so that on each frame it would stream out to a file the following the information:

  • the number of tracked bodies seen that frame
  • the 25 rotation quaternions in order from 0 to 24 for each tracked body

Getting the raw animation data out to a file was a critical first step. After all, in any final game, the animation data, properly cleaned and processed, will need to be read in from a file in order to be used by that game. But this blog’s getting long enough already. Pop back next week when I’ll go through the specifics of the Kinect joint orientations, how they should be applied to an actual 3D model, and a few other odds and ends along the way.