Windows PowerShell and the Text-to-Speech REST API (Part 3)


Summary: Use Windows PowerShell to access the Cognitive Services Text-to-Speech API.

Q: Hey, Scripting Guy!

I was reading up on how we could use PowerShell to communicate with Azure to gain an access token. I’ve been just itching to see how we can use this! Would you show us some examples of this in use in Azure?

—TL

A: Hello TL, I would be delighted to! This is a cool way to play with PowerShell as well!

If you remember from the last post, when we authenticated with the following lines to Cognitive Services, it returned a temporary access token.

Try

{

[string]$Token=$NULL

# Rest API Method

[string]$Method='POST'

# Rest API Endpoint

[string]$Uri=' https://api.cognitive.microsoft.com/sts/v1.0/issueToken'

# Authentication Key

[string]$AuthenticationKey='13775361233908722041033142028212'

# Headers to pass to Rest API

$Headers=@{'Ocp-Apim-Subscription-Key' = $AuthenticationKey }

# Get Authentication Token to communicate with Text to Speech Rest API

[string]$Token=Invoke-RestMethod -Method $Method -Uri $Uri -Headers $Headers

}

Catch [System.Net.Webexception]

{

Write-Output 'Failed to Authenticate'

}

The token was naturally stored in the object $Token, so it was easy to remember. I suppose we could have named it $CompletelyUnthinkableVariableThatIsPointless, but we didn’t. Because we can use pretty descriptive names in PowerShell, we should. It makes documenting a script easier.

Our next task is to open up the documentation on the Cognitive Services API to see what information we need to supply. We can find everything we need to know here.

Under “HTTP headers,” we can see several pieces of information we need to supply.

HTTP headers table

X-Microsoft-OutputFormat is the resulting output for the file returned.

There are many industry standard types we can use. You’ll have to play with the returned output to determine which one meets your needs.

I found that ‘riff-16khz-16bit-mono-pcm’ is the format needed for a standard WAV file. I chose WAV specifically because I can use the internal Windows services to play a WAV file, without invoking a third-party application.

We’ll assign this to an appropriately named object.

$AudioOutputType='riff-16khz-16bit-mono-pcm'

Both X-Search-AppId and X-Search-ClientID are just unique GUIDs that identify your application. In this case, we’re referring to the PowerShell script or function we’re creating.

The beautiful part is that you can do this in PowerShell right now, by using New-Guid:

Screenshot of PowerShell

If you’d like to be efficient and avoid typing (well, unless you do data entry for a living and you need to type it…I once had that job!), we can grab the Guid property and store it on the clipboard.

New-Guid | Select-Object -ExpandProperty Guid | Set-Clipboard

But the GUID format needed by the REST API requires only the number pieces. We can fix that with a quick Replace method.

(New-Guid | Select-Object -ExpandProperty Guid).replace('-','') | Set-Clipboard

Run this once for each property, and paste it into a well-descriptive variable, like so:

$XSearchAppID='dccd93ecb3cf4535aac9350c9b5fb2f8'

$XSearchClientID='45b403b6ae0d4f9ca13ca05f61a58ab2'

UserAgent is just a unique name for your application. Pick a unique but sensible name.

$UserAgent='PowerShellTextToSpeechApp'

Finally, Authorization is that token that was generated earlier, and is stored in $Token.

At this point, we put the headers together. Do you remember the last headers from authentication? It was small, but the format is the same.

$Headers=@{'Ocp-Apim-Subscription-Key' = $AuthenticationKey }

You can string it all together like this:

$Headers=@{'Property1'='Value';'Property2'='Value';'Property3'='Value';'Property4'='Value';}

But as you add more information, it becomes too unreadable for others working on your script. This is a great case for using backticks ( ` ) to separate the content out. Every time I think about backticks, I think of Patrick Warburton as “The Tick.”

Here is an example with the same information, spaced out with a space and then a backtick.

$Headers=@{ `

'Property1'='Value'; `

'Property2'='Value'; `

'Property3'='Value'; `

'Property4'='Value'; `

}

Let’s populate the values for our header from the examples I provided earlier in this page.

$AudioOutputType='riff-16khz-16bit-mono-pcm'

$XSearchAppID='dccd93ecb3cf4535aac9350c9b5fb2f8'

$XSearchClientID='45b403b6ae0d4f9ca13ca05f61a58ab2'

$UserAgent='PowerShellTextToSpeechApp'

 

$Header=@{ `

'Content-Type' = 'application/ssml+xml'; `

'X-Microsoft-OutputFormat' = $AudioOutputType; `

'X-Search-AppId' = $XSearchAppId; `

'X-Search-ClientId' = $XSearchClientId; `

'Authorization' = $AccessToken `

}

With the header populated, we are now ready to proceed to our next major piece: actually taking text and converting it to audio content, by using Azure.

But we’ll touch upon that next time. Keep watching the blog and keep on scripting!

I invite you to follow the Scripting Guys on Twitter and Facebook. If you have any questions, send email to them at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum.

Sean Kearney, Premier Field Engineer, Microsoft

Frequent contributor to Hey, Scripting Guy!

 

 

Comments (0)

Skip to main content