Kinect Development (Face tracking) – Without a Kinect

In a previous post I talked about how you can use Kinect Studio v2 Studio software to “play back” a recorded file that contains Kinect data. Your application will react to the incoming data as if it were coming from a Kinect, enabling you to develop software for a Kinect without actually having the device.

This of course requires that you have a recorded file to playback. Keep reading…

More specifically, Kinect for Windows v2 supports the ability to track not only bodies detected in the camera view, but tracking FACES. Even better, there are a number of properties on the detected face metadata that tell you if the person is:

  • looking away from the camera
  • happy
  • mouth moving
  • wearing glasses
  • …etc…

Here at IK, we have been doing a lot of Kinect work lately. It turns out the Kinect v2 device and driver are super picky when it comes to compatible USB 3 controllers. We have discovered that our laptops (Dell Precision m4800) do not have one of the approved controllers. Through lots of development trial and error, we have narrowed this down to mostly being a problem only with FACE TRACKING (the rest of the Kinect data and functionality seem to work fine).

So … even though I have a Kinect, if I’m working on face tracking, I’m out of luck on my machine in terms of development. However, using the technique described in the previous post, I can play back a Kinect Studio file and test my software just find.

To that end, we have recorded a short segment of a couple of us in view, with and without faces engaged, happy, looking and not, … and posted it here for anyone to use in their Kinect face tracking software. This recording has all the feeds turned on, including RGB, so it’s a HUGE file. Feel free to download it (below) and use it for your Kinect face tracking development.

DOWNLOAD HERE: Kinect recorded file – 2 faces, all feeds. (LARGE: ~4.4GB zipped)

Kinect Recording - 2 faces

Kinect Recording – 2 faces

Hope that helps!

Kinect Development Without a Kinect

Huh? How can you develop software that integrates with the Microsoft Kinect if you don’t have a physical Kinect? We have a number of Kinect devices around the office, but they’re all in use. I want to test and develop on an application we’re writing, … there is another way.

Enter Kinect Studio v2.0. This application is installed with the Kinect v2.0 SDK, and allows you to record and playback streams from the Kinect device. It’s usually used to debug a repeatable scenario, but we’ve been using it to spread the ability to develop Kinect-enabled applications to engineers that don’t have a physical Kinect device. There are just a couple settings to be aware of to get this to work.

Someone has to record the streams in the first place. They can select which streams (RGB, Depth, IR, Body Index, etc. list of streams shown below) to include in the recording. The recording is captured in an XEF file that can get large quickly depending on what streams are included (on the order of 4GB+ for 1 minute). Obviously, you need to include the streams that you’re looking to work with in the application you’re developing.

Streams to Capture

Choose from many streams to include in the recording

So I have my .XEF file to playback, what next?

  • Open the XEF file in Studio.
  • Go to the PLAY tab
  • IMPORTANT: Select which of the available streams you want playback to contain (see screenshot below)
  • Click the settings gear next to the playback window, and select what output you want to see during playback. This does not affect what you’re application code receives from the Kinect. It controls display in the Studio UI only.
  • Click the Connect to Service button
  • Click PLAY

You should now start getting Kinect events in your application code.

Here’s what my studio UI looks like (with highlights calling out where to change settings).
Hope that helps.

Kinect Studio UI

Kinect Studio UI

Channel 9 Interview: Real-world Kinect from TechEd

One of the more fun things I did at TechEd this year was a video interview and demo with my old friend, Dan Fernandez of Microsoft.  I have known Dan for years and we have done some serious damage all over the world.  But, unlike most of the interviews I do, I really like the way this one came out.  Dan is a natural interviewer, lead me right into the things folks would be interested most in as it relates to developer productivity and usability in Kinect for Windows.

In the interview Dan and I discuss Usability and user experience with Kinect for Windows.  I show Cursor Navigation vs. Avatar navigation.  I show a zoom and pan gesture versus more traditional gesture based zooming.  We talk about standards for gesture…or lack thereof… and I painfully talk about lessons learned in usability.

During the interview I jump off the stage and demo:

  1. The Actus Interactive Digital Content Solution – Metro Themed and Completely Content (and application) driven in the cloud, the Actus “interactive kiosk” solution is keyboard/Mouse and/or multi-touch driven by the user.  But, it’s wildly popular differentiator is that it is gesture and/or voice controlled with Kinect for Windows.  From the Actus solution I demo:
  2. The InterKnowlogy Kinect for Windows driven Physical Therapy Application.  The demo version of this production app gets a lot of play because the gesture Bryan Coon created for the rotator cuff exercise has a ton of engineering in it to force the user into doing the exercise exactly correctly.
  3. The InterKnowlogy Touch-less Operating Room (TOR).  This app is where we first introduced our own smoothing in avatar mode.  The IK engineers found that we could improve smoothing dramatically by doing it themselves instead of using the smoothing Microsoft provides with the SDK.  I’m particularly proud of this app because it’s gets a ton of demo by Microsoft Execs and will continue to.  but, what is most cool is that it was demo’d by JP Wollersheim of the Kinect team in a Rick Rashid keynote a few weeks back to 6000 people.  Rick Rashid is the Chief Research Officer of Microsoft.

The Kinect SDK and Xbox: Part 1

No code today, just some general musings and observations on Kinect. I have been playing around with the Kinect SDK for quite a while now, creating lots of fun applications for prototypes, research and just general noodling during our much valued ‘RECESS’ time. One of the things that always pops up during conversations with clients or colleagues is ‘How cool is that! We need to make an Avatar and drive it with the Kinect!’. And I agree, that would be cool.

It turns out that XNA Game Studio has its own avatar API, and can even generate a random avatar for you. Getting the avatar to move with your Kinect data is not trivial, but not impossible either- basically mapping the Kinect SkeletonData Joints to the corresponding avatar joint or position. Several people have given this a go, with good success… to a point (a simple ball joint example here). The problem lies in the data that comes from the SDK. When it loses track of a joint, or when one joint passes in front of another, the data coming out is just all over. If you stand behind a chair, your avatar or skeleton will show your legs flopping around completely akimbo. Not a pretty sight.

That leads into what really interests me: The features I see in Xbox Kinect that are not available in the SDK. For example, the Xbox version has higher fidelity and can track more joints. And even more interesting to me is that the avatars themselves have joint restrictions- so even if the data is garbage, the avatar never moves to a completely bizarre position.

What I would really like is to create an avatar with XNA Game Studio, and drive it with the data from the Kinect SDK with this joint restriction applied. And then maybe with some changes publish my application to an Xbox using the Kinect on that platform as well. Well, it seems you can’t get there from here- while XNA Game Studio will allow you (with the proper licensing) to publish games to both PC and Xbox, the Kinect packages are obviously different and incompatible.

So just to explore it, what does it take to become an Xbox Kinect developer? Well, I can’t say I have the conclusive answer on this. Not being part of that community it seems the information is difficult to obtain. From the Microsoft site, you need to be part of the Xbox 360 Registered Developers Program. And to become a partner in that program you need to email them with a concept and wait a bit. And then sign an NDA. And then wait a bit more. And that’s if you do not have a Development Account Manager. If you do, then I guess you already know what to do.

I want information. How much, exactly, does it cost? Are there any smaller Kinect for Xbox developers out there? Or is it restricted to big game developers? What equipment is needed, and kits? How difficult is it to port a game from the PC with the SDK to Xbox? Judging from the blast of media from Microsoft regarding the Kinect, I would have thought this information would have been easier to access. Ah well these are the questions I hope to have answered for Part 2!

SpeechRecognitionEngine Grammar Choices and Updating them Dynamically

The Microsoft Speech Platform provides a great speech recognition engine. I’ve been using it with the Microsoft Kinect to add voice command functionality to existing Kinect enabled WPF applications. The SDK is located here, and once installed you just need to add a reference to Microsoft.Speech.dll, which is located at C:\Program Files\Microsoft Speech Platform SDK\Assembly.

Static Choices
Once you have the SDK and assembly referenced, the process of specifying what words or phrases you want the SpeechRecognitionEngine to look for is extremely straight forward and easy. Just specify your collection of Choices, create a new GrammarBuilder and append the Choices to it, create a Grammar and provide it with the GrammarBuilder, and finally load that Grammar into the SpeechRecognitionEngine. You can then point the SpeechRecognitionEngine to your audio source and tell it to RecognizeAsync.

var sre = new SpeechRecognitionEngine(ri);
var currentChoices = new Choices();
currentChoices.Add("show blue square");

var gb = new GrammarBuilder
     Culture = _ri.Culture

sre.LoadGrammar(new Grammar(gb));

Dynamic Choices
If you know all of the possible command choices from the beginning and never need them to change, then this is as far as you need to go. I, however, wanted to be able to change what choices were valid based on what area of the application I was in. I had hoped it would be as simple as adding and removing choices for the collection, but there isn’t any remove functionality in Choices, GrammarBuilder, or Grammar. On the SpeechRecognitionEngine I did find an UnloadGrammar method so I figured I could just keep a collection of my currently valid choices, create a new Grammar from those, unload the old Grammar, and then load the new one. When I ran the application I ran into some very weird results. Calling UnloadGrammar would take a very long time to execute (it took 3 minutes for one try). Once it got past that point loading the new grammar worked, but the amount of time it was taking was unbearable and I would not be able to use it if there was always the possibility of the application freezing up for that long just to change the available audio choices. So after a long time searching I finally decided to see what MSDN had to say about the UnloadGrammar and LoadGrammar methods and found this page. It is for Office 2007 R2 but the example it provided put me on the right track.

It turns out that when the SpeechRecognitionEngine is running, any changes to the grammar need to occur when the engine is ready for the changes. To get the engine ready you need to call RequestRecognizerUpdate and pass a custom object (that contains what action you want to do and the related data) into the method as a UserToken.

sre.RequestRecognizerUpdate(new UpdateGrammarRequest
									RequestType = GrammarRequestType.UnloadGrammar,
									Grammar = _currentGrammar
sre.RequestRecognizerUpdate(new UpdateGrammarRequest
									RequestType = GrammarRequestType.LoadGrammar,
									Grammar = _currentGrammar

You also need to subscribe to the RecognizerUpdateReached event on the engine, and in that event handler you can call the UnloadGrammar and LoadGrammar methods which will then execute immediately.

private void RecognizerUpdateReached(object sender, RecognizerUpdateReachedEventArgs e)
	var request = e.UserToken as UpdateGrammarRequest;
	if (request == null)
	switch (request.RequestType)
		case GrammarRequestType.LoadGrammar:
		case GrammarRequestType.UnloadGrammar:
			throw new ArgumentOutOfRangeException();

By doing this, you still end up with a tiny bit of lag between calling RequestRecognizerUpdate and having the RecognizerUpdateReached event fire, but it is only ever a couple seconds instead of minutes.

Kinect in Windows 8 on .NET Rocks

My coworker Danny Warren and I recorded a .NET Rocks session a couple weeks that just went live tonight.  We discuss how we got a Windows 8 / WinRT application to communicate with the Microsoft Kinect.  I blogged about how we pulled that off here, but check out the podcast to hear it first hand.

.NET Rocks show #714 – Dan Hanan and Danny Warren Mix Kinect and Metro

Using Kinect in a Windows 8 / Metro App

We have been working with the Kinect for a while now, writing various apps that let you manipulate the UI of a Windows app while standing a few feet away from the computer – the “10 foot interface” as they call it.  Very cool stuff.  These apps make use of the Microsoft Kinect for Windows SDK to capture the data coming from the Kinect and translate it into types we can use in our apps:  depth data, RGB image data, and skeleton points.  Almost all of these apps are written in C# / WPF and run on Windows 7.

Last month a few of us went to the Microsoft //BUILD/ conference, and came back to start writing apps for the new Windows 8 Metro world.  Then naturally, we wanted to combine the two and have an app that uses Kinect in Windows 8 Metro.  At InterKnowlogy we have a “Kiosk Framework” that fetches content (images, audio, video) from a backend (SQL Server, SharePoint) and has a client for various form factors (Win7, Surface, Win Phone 7) that displays the content in an easy-to-navigate UI.  Let’s use the Kinect to hover a hand around the UI and push navigation buttons!  Here’s where the story begins.

One of the First Metro Apps to Use Kinect

Applications that are written to run in Windows 8 Metro are built against the new Windows Runtime (WinRT) API, which is the replacement for the old Win32 API that we’ve used since Windows 3.0.  The problem when it comes to existing code is that assemblies written in .NET are not runtime compatible with WinRT (which is native code).  There is a lot of equivalent functionality in WinRT, but you have to port existing source code over, make changes where necessary, and compile specifically against WinRT.  Since the Kinect SDK is a set of .NET assemblies, you can’t just reference it in your WinRT / Metro app and start partying with the Kinect API.  So we had to come up with some other way…

You CAN write a .NET 4.5 application in Windows 8, using Visual Studio 11 and it will run on the “desktop” side of the fence (alternate environment from the Metro UI, used for running legacy apps).  So we decided to take advantage of this and write a “Service” UI that will run in the classic desktop environment, connect to the Kinect and receive all the data from it, and then furnish that data out to a client running in the Metro side.  The next issue was – how to get the data over to our Kiosk app running in Metro?  Enter web sockets.  There is a native implementation of web sockets in the WinRT framework and we can use that to communicate on a socket channel over to the .NET 4.5 desktop which can reply to the client (Metro) socket with the Kinect data.

Some Bumps in the Road

Writing the socket implementation was not conceptually difficult.  We just want the client to poll at a given frame rate, asking for data, and the service will return simple Kinect skeleton right-hand position data.  We want to open the socket, push a “request” message across to the service, and the service will write binary data (a few doubles) back to the caller.  When pushing bytes across a raw socket, obviously the way you write and read the data on each side must match.  The first problem we ran into was that the BinaryWriter in the .NET 4.5 framework was writing data differently than the DataReader in WinRT was receiving the data.

As with any pre-release software from MIS, there is hardly any documentation on any of these APIs.  Through a ton of trial and error, I found that I had to set the Unicode and Byte Order settings on each side to something that would match. Note the highlighted lines in the following code snippets.

    // Send data from the service side

    using ( MemoryStream ms = new MemoryStream() )
        using ( BinaryWriter sw = new BinaryWriter( ms, new UnicodeEncoding() ) )
            lock ( _avatarPositionLock )
                sw.Write( _lastRightHandPosition.TrackingState );
                sw.Write( _lastRightHandPosition.X );
                sw.Write( _lastRightHandPosition.Y );


        Send( ms.GetBuffer() );
    // Receive data in the client 

    DataReader rdr = e.GetDataReader();

    // bytes based response.  3 ints in a row
    rdr.UnicodeEncoding = UnicodeEncoding.Utf16LE;
    rdr.ByteOrder = ByteOrder.LittleEndian;
    byte[] bytes = new byte[rdr.UnconsumedBufferLength];

    var data = new JointPositionData();
    var state = rdr.ReadInt16();
    Enum.TryParse<JointTrackingState>( state.ToString(), out data.TrackingState );
    data.X = rdr.ReadDouble();
    data.Y = rdr.ReadDouble();

    UpdatePositionData( data );

Once I got the socket channel communicating simple data successfully, we were off and running.  We built a control called HoverButton that just checks whether the Kinect position data is within its bounds, and if so, starts an animation to show the user they’re over the button.  If they hover long enough, we fire the Command on the button.

The next problem was connectivity from the client to “localhost” which is where the service is running (just over in the desktop environment).  Localhost is a valid address, but I would keep getting refused connections.  Finally re-read the setup instructions for the “Dot Hunter” Win8 SDK sample which tells about a special permission that’s required for a Win8 app to connect to localhost.


Open a command prompt as administrator and enter the following command (substitute your package name for the last param):

    CheckNetIsolation LoopbackExempt -a -n=interknowlogy.kiosk.win8_r825ekt7h4z5c

There is no indication that it worked – I assume silence is golden here.  (I still can’t find a way to list all the packages that have been given this right, in case you ever wanted to revoke it.)


Finally, a couple other minor gotchas:  the service UI has to be running as administrator (to open a socket on the machine), and the Windows Firewall must be turned OFF.  Now we have connectivity!

What’s Next?

Beyond those two problems, the rest was pretty straight forward.  We’re now fiddling with various performance settings to achieve the best experience possible.  The skeleton data is available from the Kinect on the desktop side at only about 10 frames per second. We think this lower rate is mostly due to the slower hardware in the Samsung Developer Preview device we got from BUILD.  Given that speed, the Metro client is currently asking for Kinect data from the service at 15 frames per second.  We are also working on better smoothing algorithms to prevent a choppy experience when moving the hand cursor around the Metro UI.