Text to Speech

eXntrc · Apr 30, 2016

I asked this on the Microsoft forums but so far haven't heard anything back. I'm hoping the Unity folks may be able to assist.

How can we do text to speech on HoloLens in Unity? I started writing my own bridge using SpeechSynthesizer but I'm a bit stumped when it comes time to play the stream. Looks like the SynthesizeTextToStreamAsync method returns a SpeechSynthesisStream. Normally you would play that with a MediaElement, but since this HoloLens app is a D3D only app I don't have XAML and I don't have MediaElement. Is there another way to play this stream? Or is there a component somewhere in the toolkit that I'm missing to enable this?

Tautvydas-Zilys · Apr 30, 2016

Hi, you should be able to call "ReadAsync" on that stream, then convert that to a byte array using this:

https://msdn.microsoft.com/en-us/library/hh582182(v=vs.110).aspx

Then, normalize that to a float array, create an empty AudioClip and then use this to set the data:

http://docs.unity3d.com/ScriptReference/AudioClip.SetData.html

Finally, play said clip.

eXntrc · May 4, 2016

Thanks @Tautvydas Zilys. Your pointers were quite helpful. I must say it was quite a bit more difficult than I expected, but it is working and quite well.

I have created a TextToSpeechManager and submitted a pull request to add it to the HoloToolkit-Unity.

I have also created an article on how I built the component and how to use it. I mentioned you in the article.

http://www.roadtoholo.com/2016/05/04/1601/text-to-speech-for-hololens/

Tautvydas-Zilys · May 5, 2016

Awesome!

Deleted User · Sep 13, 2016

@eXntrc :
I use the TextToSpeechManager in my project.
So first of all: thanks a lot for that easy to use piece of code =)
I have a character who's animation controller needs to know whether or not he's currently talking.

I modified the manager by adding the following variables and function in the begginning:

private float speakStart = 0;
private float speakDuration = 0;
public bool IsSpeaking()
{
return (Time.unscaledTime - speakStart) < speakDuration;
}

and the following assignemnets right after audioSource.Play(); :

speakStart = Time.unscaledTime;
speakDuration = clip.length;

This way my script that invokes SpeakText(), can now also set the "talking" Boolean of its animator to IsTalking() every frame.
It works (kinda), but I though I'd see if you (or anyone here) might be able to suggest a more elegant (and correct) way of implementing this feature =)

EDIT: using the AudioSource's isPlaying works

eXntrc · Aug 18, 2016

I'd also love to hear if anyone else has a better implementation. If not, I totally think you should submit it as a pull request. I tried to figure out how to do it reliably and I couldn't figure it out. My own implementation in my voice memo sample does pretty much what you did but yours is better because I didn't think about using unscaledTime.

Deleted User · Aug 20, 2016

I submitted it as a pull request: https://github.com/Microsoft/HoloToolkit-Unity/pull/181

Search Unity

Text to Speech

eXntrc

Tautvydas-Zilys

Unity Technologies

eXntrc

Tautvydas-Zilys

Unity Technologies

Deleted User

Guest

eXntrc

Deleted User

Guest

Search Unity

Unity ID

Useful Searches

Text to Speech

eXntrc

Tautvydas-Zilys

Unity Technologies

eXntrc

Tautvydas-Zilys

Unity Technologies

Deleted User

Guest

eXntrc

Deleted User

Guest