Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Word Detection - Verbal Commands

Discussion in 'Made With Unity' started by theylovegames, Sep 23, 2012.

  1. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    WebGL Speech Detection

    WebGL Speech Synthesis







    Word Detection 1.10
    http://u3d.as/3pP

    Assign verbal commands to your gameplay on Standalone, Web Player, Android and iOS. Control your game characters just by talking. Detection is standalone and does not require a network connection. Add verbal commands at runtime.



    Controlling the Character Controller with Word Detection -


    Controlling a talking head with Word Detection -


    Instructions -

    1. Label the first item as Noise (as this will just capture background noise most of the time)

    2. Label each word that you add to make your life easier

    3. Wait 1 second before speaking to let only the noise fill in

    4. Speak the word you aim to detect

    5. Immediately after speaking the word, click the button next to the label, as that will build a profile

    6. Always try to speak the word the same way in the same tone and in the same loudness

    7. If you have trouble matching a word, just repeat the word and click the valid button to update the word details

    8. Try to use dissimilar words (not like: go, go go, gos) and (like: attack, defend, run, escape, affirmative, acknowledged, zzzz)

    9. You can remove a word by clicking the remove button

    [Demo 1.7] - Word detection can play mechanim states
    [Demo 1.8] - Word profiles can be saved and loaded
    [Demo 1.9] - Word detection manipulates retro head
    [Demo 1.10] - Word detection selects morph maps
    [Demo 1.14] - Word detection manipulates blend shapes
    [Demo 1.16] - Use word detection to play goat media clips
    [Demo 1.17] - Use word detection to drive the character controller
     
    Last edited: Mar 2, 2017
    antislash and User10101 like this.
  2. kenshin

    kenshin

    Joined:
    Apr 21, 2010
    Posts:
    940
    The idea is cool ...but honestly I was not able to understand if your demo works or not. :(
     
  3. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    I'll post the how to video today.
     
  4. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    The video added includes the details about each example scene.

    The last scene is the word detection scene and the video explains how to use it.
     
  5. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    A user wrote in about not detecting a few words, here we do word detection for: "Test, Weasel, and Rutabega".


    Make sure your mic volume is set high enough to hear yourself. If you have the gain too low, it would be tough to distinguish words from background noise.
     
  6. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    I'm not doing any noise removal on the profiles, which I bet would increase the accuracy.
     
  7. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    The example word detection of clicking on the set button after saying the word is a bit counter intuitive. I'll change that to a push to record button.
     
  8. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    I've improved demo 1.2 by adding noise filtering, push to talk, and trimming. You can record profiles faster, and you can play back the sample. It helps if you forgot how you said the word last time.

    http://theylovegames.com/WordDetection_1_2.html
     
  9. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
  10. cel

    cel

    Joined:
    Feb 15, 2011
    Posts:
    46
    is it me or this doesn't work that well in the webplayer... is accuracy better in standalone?
     
  11. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    The accuracy is the same on all platforms.

    I am in the process of improving the noise filtering algorithm.

    My current theory as that samples are similar because they all have the same background noise.

    Audacity has great noise filtering, I'll review how that works.
    http://audacity.sourceforge.net/download/source
    src/effects/NoiseRemoval.h
    src/effects/NoiseRemoval.cpp

    In the meantime make your audio commands distinct by annunciating as I improve the accuracy.
     
    Last edited: Sep 25, 2012
  12. cel

    cel

    Joined:
    Feb 15, 2011
    Posts:
    46
    ok... hope you can improve it... it's a genial idea and i'm eager to try out... a few ideas pop to mind... any idea about the price?
    keep up the good work!!! :)
     
  13. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    This is going to be an alternative inexpensive solution to an expert system. The price is going to start out at $30, where I'll integrate any accuracy tips from the community and make this a stellar multi-platform product. The 1.0 product should be available any day as it's under review currently.
     
    Last edited: Sep 25, 2012
  14. fpspro9001

    fpspro9001

    Joined:
    Sep 26, 2012
    Posts:
    24
    does tool support different languages? i have interest
     
  15. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Any audio would be supported. It's not specific to languages. Instruments would work as well. As long as the samples are unique.
     
  16. fpspro9001

    fpspro9001

    Joined:
    Sep 26, 2012
    Posts:
    24
    thanks for info :) could game for music...
     
  17. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Music is trickier. I bet the RIAA or MPAA has some algorithms for that. The thought did cross my mind, whether it could detect Skrillex music based on a profile. The intent here is more to use word detection as a controller, to fire an event.

    Although there's no reason why you couldn't train it with common words to do dictation.
     
  18. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    One tiny snag where I need to rewrite the FFT algorithm in C#. The asset store only supports licenses for MIT, Creative Commons, and Simplified BSD(BSD3). The Fourier transform that I was using was under LGPL which is too restrictive.
     
  19. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    The FFT algorithm has been rewritten and replaced.

    This package has been accepted in the Asset Store:
    http://u3d.as/3pP

    Now that the package has been accepted, I can move forward with better noise removal logic.
     
  20. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    I still have plans to implement the Audacity noise removal algorithm which has great noise matching. And I figure that if it can accurately match a noise profile, it might also have better pattern matching.

    The accuracy could increase if you push to talk on the voice command. That way the length of the word would be a better match. Right now it's matching on the first few syllables which generally matches all words.

    Another demo could be dictation if we train using a bunch of letter combinations. St, Th, Ch, ABC. This will need a wave analysis algorithm in addition to spectrum analysis.

    The fingerprinting process, takes the original wave and runs a FFT algorithm. Wave is Amplitude over time. The length of the wave is the frequency. FFT counts each frequency. The math is tricky because a wave is the sum of frequencies, so you get a probability table.

    The fingerprint is all the frequencies of the word over the length of the word. Which is not as accurate as if we split up the word into small chunks and compute the frequency of each segment. In that way we can have a more accurate fingerprint.

    The fingerprint process happens in Example4.cs as SetProfile. And then the selecting algorithm happens in WordDetection.cs.

    WordDetails.cs holds the Wave, and the Spectrum. Although I should add a List<Spectrum> for the profile chunks.

    And then FourierTransform.cs I would need to modify to work over the range of the chunks, without making a bunch of array copies.
     
  21. sonicviz

    sonicviz

    Joined:
    May 19, 2009
    Posts:
    1,051
    Hi,
    Is there a release notes as to what has changed in the 1.3 update?

    ty!
     
  22. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    I wanted to port back to Unity 3.4.2 to get access to publishing an Android Live Wallpaper. Unfortunately, I could only port back to Unity 3.5.0 which added the microphone interface.

    1.3 is the same as 1.2 posted with Unity 3.5.0.
     
  23. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Feel free to make feature requests. I'm trying different accuracy methods. I'm going to add another example where you push to talk when trying to match a profile.
     
  24. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
  25. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Version 1.4 is now available, which adds a push to talk demo.
     
  26. goat

    goat

    Joined:
    Aug 24, 2009
    Posts:
    5,182
    How does this compare to the build-in Voice Command on iOS? Speech to Text on Android? Can we supply our own custom dictionaries? Is there a banned word list that it will ignore?
     
  27. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    The difference is this is a cross platform plugin. It's not going to provide you with Text to Speech yet. It's going to require users to record a dictionary of words that it will detect. Yes you can provide words to ignore.

    The more users and feedback I get, I'll be able to add more polish. Perhaps even some controls for editing the ignore list. Most of this is controlled via scripts. It's almost time for more videos to explain the 6 example scenes in detail.
     
  28. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Just for the fun of it, another use of spectral analysis is to hook up an Emotiv headset and run it through the matching algorithm.

    http://emotiv.com/

    Technically it's still word detection, it just comes from your mind!

    http://emotiv.com/store/apps/applications/117/4021

    Blinking is nice to move around. But I want to be able to think of a word and detect that.
     
  29. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Here is a very interesting video about the history of speech recognition. The problem with the most advanced approach is that it requires a network connection to a server that has a massive database to find a match.


    The advantage of my system, is that it doesn't require a network connection or a large data set in order to detect the patterns from wave forms.
     
  30. runner

    runner

    Joined:
    Jul 10, 2010
    Posts:
    865
    Have you run this thing through the profiler to measure overall performance? mainly the reason i ask would like to include but resources are rather tight with resources in my full game.
     
  31. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    In the demo examples on the mobile devices, the performance isn't ideal because it's drawing the textures to plot the graph. Disabling the plot would make the performance much better. I'll put it on the todo list. In the meantime, you can remove any reference to textures in the example code.
     
  32. runner

    runner

    Joined:
    Jul 10, 2010
    Posts:
    865
    okay it's not mobile but pc, And can we have it so the push to talk is always held down in talk mode for verbal detection ? And meant from the previous post that performance between my game and this might just be too much for the asking.
     
  33. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    The difference between the last two example scenes is that one is talk mode, the other is push to talk.
     
  34. runner

    runner

    Joined:
    Jul 10, 2010
    Posts:
    865
    1 last dumb question to ask Duh, Does it save Load the word list? Such as users of the game need to enter their words everytime they launch the game or is that something i will need to code up ?
     
  35. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Feel free to ask away.

    Saving is not currently supported.

    I'll add saving profiles to the todo list. Currently it's just binary data which could be converted to a base64 string and saved into player prefs. You could save in a device local storage. In standalone you could save the profiles into the user data. Or you could even publish the data to a server using WWW and a POST. The word details holds the profile information which just needs to be serialized and saved.
     
    Last edited: Nov 13, 2012
  36. runner

    runner

    Joined:
    Jul 10, 2010
    Posts:
    865
    purchased yesterday

    samples projects work and the word detection is pretty good at detecting words.
    ummm the set button for setting word profiles is kind of confusing at first.
     
  37. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Thanks for purchasing and welcome to our club. I'd like to see how users are using this package in their demos.

    The Example scripts are purely examples. And you can always take those to customize and build your own UI with something fancier like NGUI.

    There's no reason why you need to show the spectrum data as a texture, as that was just an example.


    Yeah I could always add a custom panel for easy configuration to see what's going on.

    Right now it's modular while I get the algorithms polished.

    @Todo:
    - Saving profiles
    - Make a clean Editor panel for setup
    - Performance hide the mic plot graphs by default and add a button toggle to get better performance on devices.
    - Create an android wallpaper that uses this package
     
    Last edited: Nov 17, 2012
  38. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
  39. ina

    ina

    Joined:
    Nov 15, 2010
    Posts:
    1,080
    curious about your results as well!
     
  40. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    A user wants to be able to detect drum beats and other instruments. Adding to the todo list.
     
  41. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Here is another potential use for verbal commands:




    Yet another potential use:

     
    Last edited: Dec 15, 2012
  42. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Feel free to pm me to subscribe and get an early alpha drop of 1.5 which includes profile saving...
     
    Last edited: Dec 15, 2012
  43. toggiee

    toggiee

    Joined:
    Jan 5, 2012
    Posts:
    7
    I have tried on Android and it is working perfectly. I can save words into player prefs and then I can load from there. Thank you :)
     
  44. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Ah cool thanks for your interest toggiee. Here's a demo of 1.5 alpha showing player pref saving in the browser.

    http://theylovegames.com/WordDetection_1_5.html
     
  45. jabuka

    jabuka

    Joined:
    Mar 24, 2012
    Posts:
    4
  46. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Technically, tones are easy to detect with a fourier transform. That would convert the wave data into frequencies. Like 440 Hz C on the piano. There are specific tone ranges to detect a dial tone.

    That said just use the example and add a word profile for each tone.
     
  47. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Version 1.5 is now in the asset store.
     
  48. Foriero

    Foriero

    Joined:
    Jan 24, 2012
    Posts:
    584
    Would it be possible to add to your package singing pitch recognition?

    For example a child has a task to sing A and we would like to react if he/she is on the pitch or off the pitch.

    We are creating music apps and your FFT solution is exactly what we need. (We already have this solution but we are not very precise)
     
    Last edited: Dec 21, 2012
  49. theylovegames

    theylovegames

    Joined:
    Aug 18, 2012
    Posts:
    176
    Exactly. You are talking about the FFT output. To detect pitch (frequency) you just are looking for a certain threshold in the FFT output. You get more precise by increasing the sample size. The first 3 examples in this package are displaying specular output.

    If you normalize the fft output, the majority of values should be 1 around the pitch you are looking for.
     
  50. Foriero

    Foriero

    Joined:
    Jan 24, 2012
    Posts:
    584
    Well If you add next to your WordDetection.cs also PitchDetection.cs I think it would be nice addition also for other developers and we certainly will buy your package.

    We have currently an issue that we are able to detect pitch not "octave" precise. That means If I ask a child to sing c4 or c3, we are not able with certainty tell if he/she is singing c4 or c3. All we get from our FFT solution is that more or less the child is singing c but the result from our FFT is sometimes also c2 or c3 even if the child is singing c4.

    So what we need is just your PitchDetection.cs. Tell it StartDetecting(), StopDetecting() since it is not necessary to have it running all the time the app is on. The detected pitch should be and certainly is in Hz

    Please let me know if it is possible for you to add this PitchDetection.cs script to your package. As I said I think that also others music developers would like to have this solution.

    Many thanks, Marek