Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Klattersynth TTS - Support Thread

Discussion in 'Assets and Asset Store' started by tonic, Aug 10, 2017.

  1. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    :eek: Klattersynth TTS
    Learn more from official website of the asset: https://strobotnik.com/unity/klattersynth/

    Klattersynth TTS is the first asset of its kind available for the Unity cross-platform engine:
    Small and fully embedded speech synthesizer.

    What features does Klattersynth TTS have?
    • It does not use OS or browser speech synth, so it sounds the SAME on all platforms. :cool:
    • Dynamically talks what you ask it for.
    • Generates and plays streamed speech in real-time.
    • In WebGL builds the AudioClips are quickly pre-generated and then played.
    • Contains English Text-To-Speech algorithm (transform to phonemes).
    • Alternatively you can enter documented phonemes directly, skipping the rules for English TTS conversion.
    • You can ask current loudness of the speech for tying effects to audio.
    • Uses normal AudioSource components: 3D spatialization, audio filters and reverb zones work like usual!
    • Contained in one ~100 KB cross-platform DLL file.
    • When embedded with your game or app and compressed for distribution, compresses down to less than 30 KB. o_O
    • Supports all Unity versions starting from 5.0.0 and available for practically all platforms targeted by Unity.
    • Also supported by RT-Voice PRO
    Why Klattersynth TTS is different from many other speech-related assets for Unity?
    • No need for the underlying platform to offer speech features (OS or browser).
    • No need for a network connection for external generation of audio clips.
    • No need to pre-generate the samples before creating a build of your app or game. The clips are either streamed realtime or generated on the fly when the app or game is running.
    Visit the official website of the asset to try out a WebGL build yourself!
    https://strobotnik.com/unity/klattersynth/

    Demo videos of Klattersynth TTS:


    https://strobotnik.com/unity/klattersynth/
     
    Last edited: Mar 17, 2024
  2. Obsurveyor

    Obsurveyor

    Joined:
    Nov 22, 2012
    Posts:
    277
    Is this considered done or are you still working on the phonemes? The F's sound more like static and Th's are kind of just a pop. Also, in the WebGL demo, the base frequency doesn't seem to affect whisper very much. Are there more audio tweaks available?
     
  3. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    Hi @Obsurveyor, I won't be actively working on the sounds of phonemes. It's only a distant possibility that I'd add 1-2 more later, or try to adjust them. But with this technique there's not going to be huge improvements in that area, a synth this small is bound to have a bit of limitations.

    The example voices in "Text Entry" demo are made by adjusting these three available parameters: "Ms Per Speech Frame" (effectively controls the speed), "Flutter" and "Flutter Speed" (which can add a bit of unsteady weirdness to sound for example, although normally the flutter is just somewhat inaudible variance to the voice wave).

    Here's an image from the inspector:
    upload_2017-8-11_10-25-34.png
    (this is the "Slow and unsteady" voice of the text entry demo)
     

    Attached Files:

  4. DbDib

    DbDib

    Joined:
    May 23, 2015
    Posts:
    12
    Very interesting, a couple of questions though. Since it's being generated realtime, is it possible to adjust the actual speed/pitch realtime as well? (eg. in the WebGL demo, being able to adjust "Base Voice Frequency" and having it change realtime instead of having to prerender it, though I understand WebGL HAS to have it prerendered). If so, this would be PERFECT for my needs! And as for my second question - I completely forgot what it was! haha.
     
  5. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    Hi @DbDibs, you're correct - WebGL has to have audio prerendered, so in WebGL builds Klattersynth will need to generate the whole clip just before playing it. It doesn't take long, but it is pre-generated before actually starting to play the clip.

    However, it is of course possible to just adjust the pitch parameter of the AudioSource playing the generated clip, as you can with any AudioClip. This will of course both change the pitch and slow down at the same time when you lower it (and vice versa).

    When used in streaming mode, the synth will latch to the parameters given at the time of starting to speak that particular line (also the msPerSpeechFrame is locked on to at initialization time, to minimize any extra memory allocations needed later). Even real-time streamed audio is generated in batches, so fine-tuned control of parameters would need to be specified in advance (if batch size is not very small). That's not a feature of the API now, but it's a possibility for future version.

    However, currently supported way is that one could simply instruct the synth to talk e.g. just a single word at a time. And just adjust the base frequency for each word to talk once previous one is finished. This would work both with streamed and pre-generated (and possibly cached) speech clips.
     
  6. lzt120

    lzt120

    Joined:
    Apr 13, 2010
    Posts:
    93
    Does this plugin support Chinese words ?
     
  7. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    @lzt120, short answer: No.

    Long answer: the text-to-speech only has an approximate mapping for English language and no other languages. There's support for entering phonemes directly (documentation has list of those). It may be possible to compose some Chinese words using the phonemes directly (which would take time and experimentation). But even then there's no possibility to express tones in the pronunciation of Chinese language.

    Thanks for the question.
     
  8. IceBeamGames

    IceBeamGames

    Joined:
    Feb 9, 2014
    Posts:
    170
    Hey Tonic. I am getting this error: "Can't pre-gen speech clips while speech is being streamed (synth is active)".

    I am trying to pre-generate a load of speech clips using this function:

    Code (CSharp):
    1.     SpeechClip [] GenerateSpeechClipArray(string[] speachStrings)
    2.     {
    3.         SpeechClip [] rtn = new SpeechClip[speachStrings.Length];
    4.         StringBuilder speakSB = new StringBuilder();
    5.  
    6.         for (int i = 0; i < speachStrings.Length; i++)
    7.         {
    8.             speakSB.Length = 0;
    9.             speakSB.Append(speachStrings[i]);
    10.             rtn[i] = speechSynth.pregenerate(speakSB, voiceFrequency, voicingSource, bracketsAsPhonemes, true);
    11.         }
    12.  
    13.         return rtn;
    14.     }
    I'm not entirely sure what I am doing wrong? Do I need to wait for a short time while the speechSynth pregenerates?
     
  9. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    Hi @Tomdominer,

    By a quick glance that looks fine to me.

    Could you verify that the speechSynth instance which you're using is not playing some other speech clip right at the time when you're asking it to pregenerate stuff?

    Also, in case the speech synth is flagged to use streaming mode, then also the AudioSource component used by the Speech also isn't allowed to playing anything when the synth is asked to pregenerate a clip.

    Does the included Pangrams Example work for you? It pre-generates its clips in a batch, so you can use it as a reference. Please check the KlattersynthTTS_Example_Pangrams_Controller.cs and the IEnumerator pangramsDemo() method. There's the if (!clipsGenerated) { ... } code block which contains the batch generation.
    (Note 1: It's a coroutine, but only for update the progress info while clips are being generated - it would also work just as well without being inside a coroutine. Note 2: There's 3 different speech synths used in the batch generation, but it works just as well if the code is modified just to use a single one.)
     
  10. Larse232312

    Larse232312

    Joined:
    Aug 28, 2013
    Posts:
    1
    is there a way to see that documented phonemes before buying the pack?
     
  11. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
  12. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    Klattersynth TTS is now also supported by RT-Voice PRO!
     
  13. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,302
    Hi,

    just imported into a new project (2019.1.8), opended KlattersynthTTS_Example_TextEntry scene,
    entered text, e.g.:
    'is there anybody in there'

    the synth completely ignores last two words (with added bonus of speaking _something_ at the beginning if there was previously any other text entered (I think))

    the webgl demo (https://strobotnik.com/unity/klattersynth/demo/) otoh behaves rather differently, and as expected

    can the package demo scene be configured to get reasonable results at least comparable to webgl version ?
    if yes, why are those settings not the same as in webgl demo ?

    Even single words such as 'Help' are not spoken identically (with another added bonus that it's sometimes apparently needed to press the enter key twice in the textbox to start the speech)

    Note: the displayed settings are exactly the same, i.e. I just run the demo scene without any changes after importing

    Another rather unpleasant surprise - is there any reason why it's distributed as an assembly only ?
    I would _very much_ rather had access to the code esp. for cases like the above - if they're not fixable by exposed user settings.

    Thanks !!

    edit: grammar
     
    Last edited: Jun 28, 2019
  14. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    Hi @r618, that sounds like an unexpected regression. There definitely shouldn't be any notable difference between WebGL version and the package. I'll investigate this.

    About being DLL only, I don't have current plans to release it in source code form.
     
  15. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    @r618 I have reproduced the issue (using 2019.2 beta).

    As a workaround, you can disable the "Use Streaming Mode" setting for the Speech component:
    disable_streaming_mode.png
    (By default it is enabled).

    The playback code for speech has two modes: streamed and non-streamed. Streamed means it will pump the speech synth frequently for data (on the fly), while the non-streamed mode will compose whole speech clip as a clip before playing it. In practice there's no big difference - even non-streamed mode is quick to compose the needed clip and play it. WebGL mode is forced to use non-streamed mode, which is why that particular regression is not likely to happen with the web mode. When I released Klattersynth I tested pretty extensively on wide variety of versions, so I guess this is probably an issue happening only with later version(s) of Unity.

    So, you can use the workaround for now. I'll debug this once I have the chance. Looks like I finally must make a new release after the initial 1.0.0, as this is the first reported issue which clearly must be fixed... :D
     
  16. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,302
    k, will try it out later
    so, with streamed mode the text can potentially be in range of 'tons', right ?
     
  17. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    @r618, yes (if the streamed mode would work, that is.. sorry for this issue). Although my guess is that likely you will ultimately have some reason to split speech to smaller parts and playback each when it's convenient.
     
  18. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    EDIT! NOTE:
    The comment below is partially correct. But, after some further tests, I think there is no regression in Unity 2018.3. Instead it has a bit different underlying implementation, which actually fixes some of weirdness which used to need some hacks with earlier versions, and not that much anymore. It's still slightly unexpected in some minor details. I modified the text below to be tiny text, since it's partly misleading.


    It seems that starting from Unity 2018.3 there's a change with audioclips using reader callbacks (I'd say it's a regression, and I have an isolated test which shows the issue between 2018.2 and 2018.3. Sorry for not noticing this when testing for compatibility with latest Unity versions.

    When reusing same streamed clip (stopping it in between), it won't anymore ask for new data at Play() but instead it will first play some old stuff it already had in the buffer. Because of this the play starts with lag (if the old data was just silence), or even with wrong audio if Unity already internally had something in the existing sample buffer.

    Additionally, because of the above issue, my code which monitors when the sound can be stopped, is not in synch with what you hear. This is because since the actual new data (start of new speech) is asked by Unity only after a considerable pause (when playing the old buffer has finished).

    Initial tests seem to indicate that the only way to cope with this for now is to create new audioclip every time, even when it used to work fine to keep using the same clip. This will create more memory pressure with new clips being created & deleted.

    I'll see about doing a bug report, and work on a new version to with internal fix to get streamed mode work again.


    Until that, the workaround is either to use Unity 2018.2.x or older, or disable "Use Streaming Mode" when using Unity 2018.3+.
     
    Last edited: Jun 30, 2019
  19. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    I edited my previous post, as it turns out I was able to make a properly looping audio source & custom filled clip, with reusing the same over stop/play, without unnecessary lags like I first thought there would be.

    I have a working proof of concept which works both on older Unity versions (with the quirks previously needed), and on newer Unity versions just as well.

    It'll take maybe a day or two until I get it integrated to Klattersynth and a new version.
     
  20. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    r618 likes this.
  21. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    boorch likes this.
  22. specularpro

    specularpro

    Joined:
    Nov 8, 2017
    Posts:
    6
    Does this support the Quest 2?
     
  23. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    Hi @specularpro, I don't know for sure as I don't have the hardware to test with. But there's nothing that special about the asset, so I don't see why it wouldn't work, at least if other relatively simple assets generally work with Quest 2.
     
    specularpro likes this.
  24. specularpro

    specularpro

    Joined:
    Nov 8, 2017
    Posts:
    6
    Is there an apk I can test? Does it work with native TTS or is there an embedded TTS plugin?
     
  25. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    @specularpro sorry, I don't have a test apk to share just now, and no spare moment to make one right now.
    It does not use native TTS. The TTS synth is fully embedded, and there's no platform-specific code. So that's why it should basically work on all Unity-supported platforms, and it also sounds exactly the same on all platforms.
    The synth is based on old and relatively simple open-source synth called rsynth, more specifically the variant featured in SoLoud library. The synth code in Klattersynth is converted to different language, and is modified and optimized for usage with Unity.
     
    specularpro likes this.
  26. RamonLion

    RamonLion

    Joined:
    Jul 19, 2012
    Posts:
    11
    Hey Tonic! I love Klattersynth so much, I want to use it for every project now!

    I wanted to know all the ways in which one could expose variables and settings to get varying results from the vocalization.

    I'm trying to expand on the TTS example scene to make a voice creator for a cast of characters. I'm really hoping to squeeze out as much characterization from the simple and lo-fi system as possible.



    Any resources or leads on how I can expand on this would be awesome.
     
  27. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439
    Hi @RamonLion, glad to hear you like the asset!
    However, the included demo shows pretty much all that is controllable in the synth for now.
    You can surely create a few different characters out of those parameters, but I guess you'll still reach a "limit" after a few...
    But, maybe you could also use the "brackets as phonemes" feature. Please read the "List of Recognized Phonemes" page carefully from the instructions, and then try to submit speech as string like this: "[beIZ xIp]" (beige ship). Remember to use true for the bracketsAsPhonemes parameter if you are calling the API.
    Writing out some characters speech using phonemes, but varying a bit how they pronounce words, could add some additional flavor.
     
    RamonLion likes this.
  28. tonic

    tonic

    Joined:
    Oct 31, 2012
    Posts:
    439