Word Detection - Verbal Commands

theylovegames · Mar 2, 2017

WebGL Speech Detection

WebGL Speech Synthesis

Word Detection 1.10
http://u3d.as/3pP

Assign verbal commands to your gameplay on Standalone, Web Player, Android and iOS. Control your game characters just by talking. Detection is standalone and does not require a network connection. Add verbal commands at runtime.

Controlling the Character Controller with Word Detection -

Controlling a talking head with Word Detection -

Instructions -

1. Label the first item as Noise (as this will just capture background noise most of the time)

2. Label each word that you add to make your life easier

3. Wait 1 second before speaking to let only the noise fill in

4. Speak the word you aim to detect

5. Immediately after speaking the word, click the button next to the label, as that will build a profile

6. Always try to speak the word the same way in the same tone and in the same loudness

7. If you have trouble matching a word, just repeat the word and click the valid button to update the word details

8. Try to use dissimilar words (not like: go, go go, gos) and (like: attack, defend, run, escape, affirmative, acknowledged, zzzz)

9. You can remove a word by clicking the remove button

[Demo 1.7] - Word detection can play mechanim states
[Demo 1.8] - Word profiles can be saved and loaded
[Demo 1.9] - Word detection manipulates retro head
[Demo 1.10] - Word detection selects morph maps
[Demo 1.14] - Word detection manipulates blend shapes
[Demo 1.16] - Use word detection to play goat media clips
[Demo 1.17] - Use word detection to drive the character controller

kenshin · Sep 23, 2012

The idea is cool ...but honestly I was not able to understand if your demo works or not.

theylovegames · Sep 23, 2012

I'll post the how to video today.

theylovegames · Sep 23, 2012

The video added includes the details about each example scene.

The last scene is the word detection scene and the video explains how to use it.

theylovegames · Sep 23, 2012

A user wrote in about not detecting a few words, here we do word detection for: "Test, Weasel, and Rutabega".

Make sure your mic volume is set high enough to hear yourself. If you have the gain too low, it would be tough to distinguish words from background noise.

theylovegames · Sep 23, 2012

I'm not doing any noise removal on the profiles, which I bet would increase the accuracy.

theylovegames · Sep 24, 2012

The example word detection of clicking on the set button after saying the word is a bit counter intuitive. I'll change that to a push to record button.

theylovegames · Sep 24, 2012

I've improved demo 1.2 by adding noise filtering, push to talk, and trimming. You can record profiles faster, and you can play back the sample. It helps if you forgot how you said the word last time.

http://theylovegames.com/WordDetection_1_2.html

theylovegames · Sep 24, 2012

In demo 1.3 you can use voice commands to move a cube around.
http://theylovegames.com/WordDetection_1_3.html

cel · Sep 25, 2012

is it me or this doesn't work that well in the webplayer... is accuracy better in standalone?

theylovegames · Sep 25, 2012

cel said: ↑

is it me or this doesn't work that well in the webplayer... is accuracy better in standalone?
Click to expand...

The accuracy is the same on all platforms.

I am in the process of improving the noise filtering algorithm.

My current theory as that samples are similar because they all have the same background noise.

Audacity has great noise filtering, I'll review how that works.
http://audacity.sourceforge.net/download/source
src/effects/NoiseRemoval.h
src/effects/NoiseRemoval.cpp

In the meantime make your audio commands distinct by annunciating as I improve the accuracy.

cel · Sep 25, 2012

ok... hope you can improve it... it's a genial idea and i'm eager to try out... a few ideas pop to mind... any idea about the price?
keep up the good work!!!

theylovegames · Sep 25, 2012

cel said: ↑

ok... hope you can improve it... it's a genial idea and I'm eager to try out... a few ideas pop to mind... any idea about the price?
keep up the good work!!!
Click to expand...

This is going to be an alternative inexpensive solution to an expert system. The price is going to start out at $30, where I'll integrate any accuracy tips from the community and make this a stellar multi-platform product. The 1.0 product should be available any day as it's under review currently.

fpspro9001 · Sep 26, 2012

does tool support different languages? i have interest

theylovegames · Sep 26, 2012

fpspro9001 said: ↑

does tool support different languages? i have interest
Click to expand...

Any audio would be supported. It's not specific to languages. Instruments would work as well. As long as the samples are unique.

fpspro9001 · Sep 26, 2012

theylovegames said: ↑

Any audio would be supported. It's not specific to languages. Instruments would work as well. As long as the samples are unique.
Click to expand...

thanks for info could game for music...

theylovegames · Sep 26, 2012

fpspro9001 said: ↑

thanks for info could game for music...
Click to expand...

Music is trickier. I bet the RIAA or MPAA has some algorithms for that. The thought did cross my mind, whether it could detect Skrillex music based on a profile. The intent here is more to use word detection as a controller, to fire an event.

Although there's no reason why you couldn't train it with common words to do dictation.

theylovegames · Sep 27, 2012

One tiny snag where I need to rewrite the FFT algorithm in C#. The asset store only supports licenses for MIT, Creative Commons, and Simplified BSD(BSD3). The Fourier transform that I was using was under LGPL which is too restrictive.

theylovegames · Oct 1, 2012

theylovegames said: ↑

One tiny snag where I need to rewrite the FFT algorithm in C#. The asset store only supports licenses for MIT, Creative Commons, and Simplified BSD(BSD3). The Fourier transform that I was using was under LGPL which is too restrictive.
Click to expand...

The FFT algorithm has been rewritten and replaced.

This package has been accepted in the Asset Store:
http://u3d.as/3pP

Now that the package has been accepted, I can move forward with better noise removal logic.

theylovegames · Oct 2, 2012

I still have plans to implement the Audacity noise removal algorithm which has great noise matching. And I figure that if it can accurately match a noise profile, it might also have better pattern matching.

The accuracy could increase if you push to talk on the voice command. That way the length of the word would be a better match. Right now it's matching on the first few syllables which generally matches all words.

Another demo could be dictation if we train using a bunch of letter combinations. St, Th, Ch, ABC. This will need a wave analysis algorithm in addition to spectrum analysis.

The fingerprinting process, takes the original wave and runs a FFT algorithm. Wave is Amplitude over time. The length of the wave is the frequency. FFT counts each frequency. The math is tricky because a wave is the sum of frequencies, so you get a probability table.

The fingerprint is all the frequencies of the word over the length of the word. Which is not as accurate as if we split up the word into small chunks and compute the frequency of each segment. In that way we can have a more accurate fingerprint.

The fingerprint process happens in Example4.cs as SetProfile. And then the selecting algorithm happens in WordDetection.cs.

WordDetails.cs holds the Wave, and the Spectrum. Although I should add a List<Spectrum> for the profile chunks.

And then FourierTransform.cs I would need to modify to work over the range of the chunks, without making a bunch of array copies.

sonicviz · Oct 16, 2012

Hi,
Is there a release notes as to what has changed in the 1.3 update?

ty!

theylovegames · Oct 16, 2012

sonicviz said: ↑

Hi,
Is there a release notes as to what has changed in the 1.3 update?

ty!
Click to expand...

I wanted to port back to Unity 3.4.2 to get access to publishing an Android Live Wallpaper. Unfortunately, I could only port back to Unity 3.5.0 which added the microphone interface.

1.3 is the same as 1.2 posted with Unity 3.5.0.

theylovegames · Oct 16, 2012

Feel free to make feature requests. I'm trying different accuracy methods. I'm going to add another example where you push to talk when trying to match a profile.

theylovegames · Oct 20, 2012

In the 1.4, with push to talk, the voice samples line up better which increases accuracy.
http://theylovegames.com/WordDetection_1_4.html

theylovegames · Oct 22, 2012

Version 1.4 is now available, which adds a push to talk demo.

goat · Oct 25, 2012

How does this compare to the build-in Voice Command on iOS? Speech to Text on Android? Can we supply our own custom dictionaries? Is there a banned word list that it will ignore?

theylovegames · Oct 25, 2012

goat said: ↑

How does this compare to the build-in Voice Command on iOS? Speech to Text on Android? Can we supply our own custom dictionaries? Is there a banned word list that it will ignore?
Click to expand...

The difference is this is a cross platform plugin. It's not going to provide you with Text to Speech yet. It's going to require users to record a dictionary of words that it will detect. Yes you can provide words to ignore.

The more users and feedback I get, I'll be able to add more polish. Perhaps even some controls for editing the ignore list. Most of this is controlled via scripts. It's almost time for more videos to explain the 6 example scenes in detail.

theylovegames · Nov 6, 2012

Just for the fun of it, another use of spectral analysis is to hook up an Emotiv headset and run it through the matching algorithm.

http://emotiv.com/

Technically it's still word detection, it just comes from your mind!

http://emotiv.com/store/apps/applications/117/4021

Blinking is nice to move around. But I want to be able to think of a word and detect that.

theylovegames · Nov 8, 2012

Here is a very interesting video about the history of speech recognition. The problem with the most advanced approach is that it requires a network connection to a server that has a massive database to find a match.

The advantage of my system, is that it doesn't require a network connection or a large data set in order to detect the patterns from wave forms.

runner · Nov 12, 2012

Have you run this thing through the profiler to measure overall performance? mainly the reason i ask would like to include but resources are rather tight with resources in my full game.

theylovegames · Nov 12, 2012

In the demo examples on the mobile devices, the performance isn't ideal because it's drawing the textures to plot the graph. Disabling the plot would make the performance much better. I'll put it on the todo list. In the meantime, you can remove any reference to textures in the example code.

runner · Nov 12, 2012

theylovegames said: ↑

In the demo examples on the mobile devices.
Click to expand...

okay it's not mobile but pc, And can we have it so the push to talk is always held down in talk mode for verbal detection ? And meant from the previous post that performance between my game and this might just be too much for the asking.

theylovegames · Nov 12, 2012

runner said: ↑

okay it's not mobile but pc, And can we have it so the push to talk is always in talk mode for verbal detection ?
and meant from the previous post that performance between my game and this might just be too much for the asking.
Click to expand...

The difference between the last two example scenes is that one is talk mode, the other is push to talk.

runner · Nov 12, 2012

theylovegames said: ↑

The difference between the last two example scenes is that one is talk mode, the other is push to talk.
Click to expand...

1 last dumb question to ask Duh, Does it save Load the word list? Such as users of the game need to enter their words everytime they launch the game or is that something i will need to code up ?

theylovegames · Nov 13, 2012

runner said: ↑

1 last dumb question to ask Duh, Does it save Load the word list? Such as users of the game need to enter their words everytime they launch the game or is that something i will need to code up ?
Click to expand...

Feel free to ask away.

Saving is not currently supported.

I'll add saving profiles to the todo list. Currently it's just binary data which could be converted to a base64 string and saved into player prefs. You could save in a device local storage. In standalone you could save the profiles into the user data. Or you could even publish the data to a server using WWW and a POST. The word details holds the profile information which just needs to be serialized and saved.

runner · Nov 17, 2012

purchased yesterday

samples projects work and the word detection is pretty good at detecting words.
ummm the set button for setting word profiles is kind of confusing at first.

theylovegames · Nov 17, 2012

runner said: ↑

purchased yesterday

samples projects work and the word detection is pretty good at detecting words.
ummm the set button for setting word profiles is kind of confusing at first.
Click to expand...

Thanks for purchasing and welcome to our club. I'd like to see how users are using this package in their demos.

The Example scripts are purely examples. And you can always take those to customize and build your own UI with something fancier like NGUI.

There's no reason why you need to show the spectrum data as a texture, as that was just an example.

Yeah I could always add a custom panel for easy configuration to see what's going on.

Right now it's modular while I get the algorithms polished.

@Todo:
- Saving profiles
- Make a clean Editor panel for setup
- Performance hide the mic plot graphs by default and add a button toggle to get better performance on devices.
- Create an android wallpaper that uses this package

theylovegames · Nov 27, 2012

I'll have to try MindWave Mobile for input and see if the verbal detection works. I'll let you know if I can "think" the words in the example demos on my Nexus 10.

http://store.neurosky.com/products/mindwave-mobile

ina · Dec 1, 2012

curious about your results as well!

theylovegames · Dec 1, 2012

A user wants to be able to detect drum beats and other instruments. Adding to the todo list.

theylovegames · Dec 15, 2012

Here is another potential use for verbal commands:

Yet another potential use:

theylovegames · Dec 15, 2012

Feel free to pm me to subscribe and get an early alpha drop of 1.5 which includes profile saving...

toggiee · Dec 16, 2012

I have tried on Android and it is working perfectly. I can save words into player prefs and then I can load from there. Thank you

theylovegames · Dec 16, 2012

toggiee said: ↑

I have tried on Android and it is working perfectly. I can save words into player prefs and then I can load from there. Thank you
Click to expand...

Ah cool thanks for your interest toggiee. Here's a demo of 1.5 alpha showing player pref saving in the browser.

http://theylovegames.com/WordDetection_1_5.html

jabuka · Dec 17, 2012

How can I recognize DTMF sounds from sound card using PC with phone PCI card?

http://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling

theylovegames · Dec 17, 2012

jabuka said: ↑

How can I recognize DTMF sounds from sound card using PC with phone PCI card?

http://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling
Click to expand...

Technically, tones are easy to detect with a fourier transform. That would convert the wave data into frequencies. Like 440 Hz C on the piano. There are specific tone ranges to detect a dial tone.

That said just use the example and add a word profile for each tone.

theylovegames · Dec 20, 2012

Version 1.5 is now in the asset store.

Foriero · Dec 21, 2012

Would it be possible to add to your package singing pitch recognition?

For example a child has a task to sing A and we would like to react if he/she is on the pitch or off the pitch.

We are creating music apps and your FFT solution is exactly what we need. (We already have this solution but we are not very precise)

theylovegames · Dec 21, 2012

Foriero said: ↑

Would it be possible to add to your package singing pitch recognition?

For example a child has a task to sing A and we would like to react if he/she is on the pitch or off the pitch.

We are creating music apps and your FFT solution is exactly what we need. (We already have this solution but we are not very precise)
Click to expand...

Exactly. You are talking about the FFT output. To detect pitch (frequency) you just are looking for a certain threshold in the FFT output. You get more precise by increasing the sample size. The first 3 examples in this package are displaying specular output.

If you normalize the fft output, the majority of values should be 1 around the pitch you are looking for.

Foriero · Dec 22, 2012

Well If you add next to your WordDetection.cs also PitchDetection.cs I think it would be nice addition also for other developers and we certainly will buy your package.

We have currently an issue that we are able to detect pitch not "octave" precise. That means If I ask a child to sing c4 or c3, we are not able with certainty tell if he/she is singing c4 or c3. All we get from our FFT solution is that more or less the child is singing c but the result from our FFT is sometimes also c2 or c3 even if the child is singing c4.

So what we need is just your PitchDetection.cs. Tell it StartDetecting(), StopDetecting() since it is not necessary to have it running all the time the app is on. The detected pitch should be and certainly is in Hz

Please let me know if it is possible for you to add this PitchDetection.cs script to your package. As I said I think that also others music developers would like to have this solution.

Many thanks, Marek

Search Unity

Unity ID

Useful Searches

Word Detection - Verbal Commands