SpeechRecognitionEngine Grammar Choices and Updating them Dynamically

The Microsoft Speech Platform provides a great speech recognition engine. I’ve been using it with the Microsoft Kinect to add voice command functionality to existing Kinect enabled WPF applications. The SDK is located here, and once installed you just need to add a reference to Microsoft.Speech.dll, which is located at C:\Program Files\Microsoft Speech Platform SDK\Assembly.

Static Choices
Once you have the SDK and assembly referenced, the process of specifying what words or phrases you want the SpeechRecognitionEngine to look for is extremely straight forward and easy. Just specify your collection of Choices, create a new GrammarBuilder and append the Choices to it, create a Grammar and provide it with the GrammarBuilder, and finally load that Grammar into the SpeechRecognitionEngine. You can then point the SpeechRecognitionEngine to your audio source and tell it to RecognizeAsync.

var sre = new SpeechRecognitionEngine(ri);
var currentChoices = new Choices();
currentChoices.Add("red");
currentChoices.Add("show blue square");

var gb = new GrammarBuilder
{
     Culture = _ri.Culture
};
gb.Append(currentChoices);

sre.LoadGrammar(new Grammar(gb));

Dynamic Choices
If you know all of the possible command choices from the beginning and never need them to change, then this is as far as you need to go. I, however, wanted to be able to change what choices were valid based on what area of the application I was in. I had hoped it would be as simple as adding and removing choices for the collection, but there isn’t any remove functionality in Choices, GrammarBuilder, or Grammar. On the SpeechRecognitionEngine I did find an UnloadGrammar method so I figured I could just keep a collection of my currently valid choices, create a new Grammar from those, unload the old Grammar, and then load the new one. When I ran the application I ran into some very weird results. Calling UnloadGrammar would take a very long time to execute (it took 3 minutes for one try). Once it got past that point loading the new grammar worked, but the amount of time it was taking was unbearable and I would not be able to use it if there was always the possibility of the application freezing up for that long just to change the available audio choices. So after a long time searching I finally decided to see what MSDN had to say about the UnloadGrammar and LoadGrammar methods and found this page. It is for Office 2007 R2 but the example it provided put me on the right track.

It turns out that when the SpeechRecognitionEngine is running, any changes to the grammar need to occur when the engine is ready for the changes. To get the engine ready you need to call RequestRecognizerUpdate and pass a custom object (that contains what action you want to do and the related data) into the method as a UserToken.

sre.RequestRecognizerUpdate(new UpdateGrammarRequest
								{
									RequestType = GrammarRequestType.UnloadGrammar,
									Grammar = _currentGrammar
								});
sre.RequestRecognizerUpdate(new UpdateGrammarRequest
								{
									RequestType = GrammarRequestType.LoadGrammar,
									Grammar = _currentGrammar
								});

You also need to subscribe to the RecognizerUpdateReached event on the engine, and in that event handler you can call the UnloadGrammar and LoadGrammar methods which will then execute immediately.

private void RecognizerUpdateReached(object sender, RecognizerUpdateReachedEventArgs e)
{
	var request = e.UserToken as UpdateGrammarRequest;
	if (request == null)
		return;
		
	switch (request.RequestType)
	{
		case GrammarRequestType.LoadGrammar:
			sre.LoadGrammar(request.Grammar);
			break;
		case GrammarRequestType.UnloadGrammar:
			sre.UnloadGrammar(request.Grammar);
			break;
		default:
			throw new ArgumentOutOfRangeException();
	}
}

By doing this, you still end up with a tiny bit of lag between calling RequestRecognizerUpdate and having the RecognizerUpdateReached event fire, but it is only ever a couple seconds instead of minutes.

Leave a Reply

Your email address will not be published. Required fields are marked *