CloudTopia: Connecting o365, Apps, Azure and Cortana – Part 6

In Part 5 of this series we looked at all of the different ways in which CloudTopia is integrated with Office 365 through our custom Web API REST controller. We’re going to wrap up this series with a little sizzle, and look at the custom Windows Phone app we wrote to work with CloudTopia using voice recognition, speech, and Cortana.

Here's some quick links to the whole series:

  1. Intro
  2. Open Graph
  3. Web API
  4. Azure Integration
  5. Office 365 Integration
  6. Cortana, Speech and Windows Phone Integration

You heard it here first – a picture’s worth a thousand words. J  To start with I’d recommend watching the demo video of the Cortana Social Events app that you will find here: https://1drv.ms/1kPjjKf.

Okay, hopefully you watched the video and now your appetite has been whetted a little. So let’s talk Windows Phone 8.1, using voice and integrating with Cortana. The implementation itself consists of two parts:

  • Your custom app that does voice recognition and optionally speech.

  • Integration with Cortana

All of the coding and logic happens in your app. Integration with Cortana actually just happens via an Xml file, that is both surprisingly powerful and easy. In fact that’s my theme for this entire post – the folks that have been doing the work on speech in Windows Phone need to take a friggin’ bow: their stuff works pretty dang good and the implementation is much simpler than I expected.

So how do we do it? Well first and foremost I would recommend that you download the MSDN sample app that demonstrates all this functionality from https://aka.ms/v4o3f0. When I first started looking around I didn’t find any kind of handy tutorial to get me on my way so I relied heavily on this app. With that as a resource for you to lean on, here are the basic steps to integrating voice recognition, speech and Cortana into your Windows Phone apps:

  • Create a new Windows Phone Silverlight App

  • Add a SpeechRecognizer instance in code to listen; add a SpeechSynthesizer to speak

  • Initialize the Recognizer

  • Tell the Recognizer to start listening

  • When the Recognizer completed action fires, take the recognized words and do “something”

 

Create New Windows Phone Silverlight App

This should be pretty obvious but just want to make sure I call this out. As of the time I am writing this post all of this voice goodness is not integrated with Windows Universal Apps, so when you create a new project create a Windows Phone Silverlight app project type as shown here:

 

Add Speech Recognizer and Synthesizer

Start by adding instances of SpeechRecognizer, SpeechSynthesizer, plus a few other classes that are used for maintaining state to the “main” page of your application (typically mainpage.xaml). This is the page where you’re going to do your voice recognition, and eventually connect to Cortana. The instances you add should look like this:

// State maintenance of the Speech Recognizer

private SpeechRecognizer Recognizer;

private AsyncOperationCompletedHandler<SpeechRecognitionResult> recoCompletedAction;

private IAsyncOperation<SpeechRecognitionResult> CurrentRecognizerOperation;

 

// State maintenance of the Speech Synthesizer

private SpeechSynthesizer Synthesizer;

private IAsyncAction CurrentSynthesizerAction;

 

In order to get these all to resolve you may need to add some using statements for Windows.Phone.Speech.Recognition, Windows.Phone.Speech.Synthesis, and Windows.Phone.Speech.VoiceCommands. Okay, step 2 complete.

Initialize the Recognizer

The next step is to initialize the Recognizer – this is what’s going to do the speech recognition for us. To start that process we’ll create a new instance of the Recognizer like so:

this.Recognizer = new SpeechRecognizer();

 

Once the Recognizer has been created we need to add grammar sets. The grammar sets are the collection of words the Recognizer is going to use to recognize what it hears. While this could potentially be a daunting task, it is – surprise – pretty easy in most cases. Here’s what I did to populate my grammar set:

this.Recognizer.Grammars.AddGrammarFromPredefinedType("search", SpeechPredefinedGrammar.WebSearch);

await this.Recognizer.PreloadGrammarsAsync();

 

Boom – there you go – done adding my grammar set. There are a couple of grammar sets that ship out of the box, and so far I have found the WebSearch set to do everything I need. You can also create your own grammar sets if you like; dev.windowsphone.com has documentation on how to do that. The last thing you want to do for the initialization is to define what to do when speech is actually recognized. To do that we’re going to add a handler for when speech recognition is completed. Remember above where I made this declaration – private AsyncOperationCompletedHandler<SpeechRecognitionResult> recoCompletedAction? We’re going to use that recoCompletedAction variable now, like this:

recoCompletedAction = new

AsyncOperationCompletedHandler<SpeechRecognitionResult>

((operation, asyncStatus) =>

{

Dispatcher.BeginInvoke(() =>

{

this.CurrentRecognizerOperation = null;

switch (asyncStatus)

{

case AsyncStatus.Completed:

SpeechRecognitionResult result =

operation.GetResults();

 

//use the recognized text here     

LaunchSearch(result.Text);

break;

case AsyncStatus.Error:

//respond to error; often user

//hasn’t accepted privacy policy

break;

}

});

});

So what we’re saying here is that when the speech recognition event happens, if we completed recognition successfully then we’re going to extract the collection of recognized words by getting a SpeechRecognitionResult and looking at its Text property. If there was an error, then we’ll need to do something else. Where I found the error condition happening was after building my app, when I tried the app out on the emulator. You always have to accept the privacy policy before it will do voice recognition and it’s an easy thing to forget when you start up a new emulator session. Other than that though, you are set – that’s all you need to do to initialize the Recognizer before you start listening for speech.

Use the Recognizer to Start Listening

Now that the Recognizer is configured you can start listening. It’s pretty easy – just start listening asynchronously and configure the listening completed handler you want to use, which will be the recoCompletedAction variable I was describing above:

this.CurrentRecognizerOperation = this.Recognizer.RecognizeAsync();

this.CurrentRecognizerOperation.Completed = recoCompletedAction;

As far as speech recognition goes, that’s it – you’re done! Pretty easy, huh? It’s amazingly simple now to add voice recognition to your apps so I recommend you go out and give it a whirl. For the CloudTopia application, when speech recognition was completed you’ll notice that I invoked a method called LaunchSearch. In that method I create a query term that I want to use for CloudTopia and then I send it off to another REST endpoint I created in my SharePoint App project. See, those Web API REST endpoints are very valuable!

 

The way I decided to implement the search capability was to make the spoken query search for Twitter tags that have been configured for an event. So after recognizing some spoken words I get a list of all the words that I think are not noise words. To do that, I just keep a list of words that I consider to be noise and I load those into a List<string> when my Windows Phone app starts up. It includes words like “event”, “Cortana”, “find”, etc. After I eliminate all the noise words I see what’s left and assuming there is still one or more words there, I concatenate them together. They get concatenated because we’re going to be searching for hashtags, and hashtags are always going to be a single word. That makes it easy to say things like “yammer of july” and then have it concatenated into a the hashtag “yammerofjuly” (if you read about this series on Twitter you’ll know exactly what I’m talking about).

If I have a hashtag to search for I use the SpeechSynthesizer to speak back to the person what I’m going to search for (I’ll cover that in more detail in just a bit) and then I go send off my query to my REST endpoint. The chunk of code to do all of that is here:

//we really only want to look for a single phrase, which could be several words

//but in a tag are represented as a single word

//in order to do that we'll take all the words we're given,

//extract out the noise words, and then create a single word

//that's a concatenation of all the remaining non-words into a single phrase

string searchTerm = GetSearchTerm(queryTerms);

 

if (!string.IsNullOrEmpty(searchTerm))

{

//create the template for speaking back to us

string htmlEncodedQuery = HttpUtility.HtmlEncode(searchTerm);

StartSpeakingSsml(String.Format(

AppResources.SpokenSearchShortTemplate, htmlEncodedQuery));

 

//update UI

Dispatcher.BeginInvoke(() =>

{

WaitTxt.Text = "Searching for \"" + searchTerm + "\" events...";

});

 

//execute the query

QueryEvents(searchTerm);

}

else

{

WaitTxt.Text = "Sorry, there were only noise words in your search request";

}

 

That code sets us up to send the query to the REST endpoint, and here’s where we actually do the query; one of the things it hopefully demonstrates is just how easy it is to use a REST endpoint. Within the REST controller it takes the search term that was passed in and looks for a match against any of the Twitter tags in SQL Azure. It uses a LIKE comparison so you don’t need to have the exact tag to find a match. If it does find one or more events then it also uses the Yammer Search REST endpoint to query Yammer for matches as well. It then sends back both the events and hits from Yammer for us to display in our Windows Phone app. This is coolness…querying the Yammer cloud service by talking to my phone. Love it.

string searchUrl =

"https://socialevents.azurewebsites.net/api/events/search?tagName=" + searchTerms;

 

HttpClient hc = new HttpClient();

string data = await hc.GetStringAsync(searchUrl);

 

//if we got some data back then try and load it into our set of search results of

//social events and Yammer messages

if (!string.IsNullOrEmpty(data))

{

CortanaSearchResult csr = CortanaSearchResult.GetInstanceFromJson(data);

 

Dispatcher.BeginInvoke(() =>

{

//if we found some events plug them

//into our UI by databinding to the

//lists in the Panorama control

if ((csr != null) && ((csr.Events.Count > 0) ||

(csr.YammerMessages.Count > 0)))

{

                        EventsLst.DataContext = csr.Events;

                        YammerLst.DataContext = csr.YammerMessages;

}

else

{

//update the UI to show that there were no search results found

WaitTxt.Text = "Sorry, I couldn't find any results for \"" +

searchTerms + "\"";

 

}

});

}

 

Add Speech to Your App

Adding speech to your application is even easier than adding voice recognition. The first thing you’re going to do is create a new instance of the SpeechSynthesizer, like this:

this.Synthesizer = new SpeechSynthesizer();

 

That’s all you do to set it up. To actually have my app say something I created a simple method for it that looks like this:

private void StartSpeakingSsml(string ssmlToSpeak)

{

//Begin speaking using our synthesizer, wiring the

//completion event to stop tracking the action

//when it finishes.

           

this.CurrentSynthesizerAction = this.Synthesizer.SpeakSsmlAsync(ssmlToSpeak);

this.CurrentSynthesizerAction.Completed = new AsyncActionCompletedHandler(

(operation, asyncStatus) =>

{

Dispatcher.BeginInvoke(() =>

{ this.CurrentSynthesizerAction = null; });

});

}

There are different methods and overloads that you can use to have the phone speak. I chose to use SpeakSsml because of the control it gives you over the text that’s spoken. SSML stands for Speech Synthesis Markup Language, and it’s really just Xml that uses a schema to control things like pitch and rate at which words are spoken. Here’s an example of an Xml template I use to say back to the user what it is we are searching for:

<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang="en-US"> <prosody pitch='+35%' rate='-10%'> Searching </prosody> <prosody pitch='-15%'> for </prosody> {0} events </speak>

The way I use that is to create the string that it will say like this:

StartSpeakingSsml(String.Format(

AppResources.SpokenSearchShortTemplate, searchTerm));

 

So if the search term is “yammerofjuly” what is spoken back to the user is “Searching for yammerofjuly”, but it’s said with different pitch and rate around the words “Searching” and “for”. VERY cool stuff.

 

Cortana Integration

Finally the last thing I did was “integrate” my app with Cortana. What does that mean exactly? Well I wanted someone to be able to use Cortana in Windows Phone 8.1 to execute a query using my Social Events app. The way you do that is you create a VoiceCommandDefinition file, which is just another Xml file. In the file you can configure things like the name of your app, examples that can be shown users who aren’t sure how to use voice recognition with your app, the things to listen for with your app, etc.

The first and most important thing I defined in my file is the CommandPrefix; this is really just the name by which my app will be known. The Xml looks like this:

<!-- The CommandPrefix provides an alternative to your full app name for invocation -->

  <CommandPrefix> Social Events </CommandPrefix>

What this means now is when someone says something to Cortana that starts with “Social Events”, Cortana knows that it needs to use my app to get the results. For example, if I say “Social Events find yammer of july events”, Cortana will figure out that “Social Events” means my app, and it’s going to let my app know that the recognized words were “find yammer of july events”. The way it lets my app know is that it’s going to launch my app and redirect me to a page in the application (you can configure in the VoiceCommandDefinition file the page in your app where it redirects to as well; mine gets sent to mainpage.xaml). In the code behind for your page then you can override the OnNavigatedTo event and look at the query string.

If Cortana sent the user to your app, it will have used a query string variable that you have also defined in your Xml file, in the ListenFor element:

<!-- ListenFor elements provide ways to say the command as well as [optional] words -->

      <ListenFor> Search [for] {dictatedSearchTerms} </ListenFor>

Note the “dictatedSearchTerms” in the curly brackets. There are actually multiple uses for it, but for now I’ll just explain the use case I have. When Cortana redirects a request to my application it will use “dictatedSearchTerms” as the query string variable that contains the recognized words. That means in my override of OnNavigatedTo I can look to see if the QueryString collection contains the key “dictatedSearchTerms”. If it does, I’ll extract the value and then just call my same LaunchSearch method that I described earlier, and pass in the value from the query string.

One of the other really useful aspects of the Xml file is the ability to create examples of how your app can be used with Cortana. When you say “what can I say” to Cortana, it will display a list of all the apps that it knows about that can use voice recognition; if you’ve registered your VoiceCommandDefinition file (more on how to do that in a moment), then your app will be display along with an example of what can be said with your app. The example that it displays is what you configured in your VoiceCommandDefinition file, like this:

<!-- The CommandSet Example appears in the global help alongside your app name -->

        <Example> find blazer party events </Example>

 

So in this case it will show the small icon for my application, it display my application name in big bold letters, and then it will say find blazer party events underneath it. Again, very cool!

If you click the icon of the application it brings it all together and will display something like this:

 

The last step now is to install your definition file. You can install it as many times as you want, and it just overwrites whatever copy it had previously. Because of that, I just install my definition file every time my Windows Phone app starts up. It’s one simple line of code:

await VoiceCommandService.InstallCommandSetsFromFileAsync("ms-appx:///MyVoiceDefinitionFile.xml");

 

Well there you go – that’s it. That is a wrap on this six part series on the CloudTopia app. Don’t forget to go grab the code from GitHub.Com (the exact location is included in Part 1 of this series). I hope you can use this to help navigate your way through some of the very interesting connections you can make in a pure cloud-hosted world with SharePoint Apps and many other cloud services. When you get the chance, you can also add some dazzle to your apps with some voice recognition and Cortana integration. Finally…time to stop writing blog posts and go back to building apps.

CloudTopia Part 6.docx