Word Detection 1.5
Assign verbal commands to your gameplay on Standalone, Web Player, Android and iOS. Control your game characters just by talking. Detection is standalone and does not require a network connection. Add verbal commands at runtime.
Preview Video -
1. Label the first item as Noise (as this will just capture background noise most of the time)
2. Label each word that you add to make your life easier
3. Wait 1 second before speaking to let only the noise fill in
4. Speak the word you aim to detect
5. Immediately after speaking the word, click the button next to the label, as that will build a profile
6. Always try to speak the word the same way in the same tone and in the same loudness
7. If you have trouble matching a word, just repeat the word and click the valid button to update the word details
8. Try to use dissimilar words (not like: go, go go, gos) and (like: attack, defend, run, escape, affirmative, acknowledged, zzzz)
9. You can remove a word by clicking the remove button
Demo 1.1 - http://theylovegames.com/WordDetection_1_1.html
Demo 1.2 - http://theylovegames.com/WordDetection_1_2.html
Demo 1.3 - http://theylovegames.com/WordDetection_1_3.html
Demo 1.4 - http://theylovegames.com/WordDetection_1_4.html
API Overview - Word Detection Example:
The idea is cool ...but honestly I was not able to understand if your demo works or not.
I'll post the how to video today.
The video added includes the details about each example scene.
The last scene is the word detection scene and the video explains how to use it.
A user wrote in about not detecting a few words, here we do word detection for: "Test, Weasel, and Rutabega".
Make sure your mic volume is set high enough to hear yourself. If you have the gain too low, it would be tough to distinguish words from background noise.
I'm not doing any noise removal on the profiles, which I bet would increase the accuracy.
The example word detection of clicking on the set button after saying the word is a bit counter intuitive. I'll change that to a push to record button.
I've improved demo 1.2 by adding noise filtering, push to talk, and trimming. You can record profiles faster, and you can play back the sample. It helps if you forgot how you said the word last time.
In demo 1.3 you can use voice commands to move a cube around.
is it me or this doesn't work that well in the webplayer... is accuracy better in standalone?
The accuracy is the same on all platforms.
I am in the process of improving the noise filtering algorithm.
My current theory as that samples are similar because they all have the same background noise.
Audacity has great noise filtering, I'll review how that works.
In the meantime make your audio commands distinct by annunciating as I improve the accuracy.
ok... hope you can improve it... it's a genial idea and i'm eager to try out... a few ideas pop to mind... any idea about the price?
keep up the good work!!!
does tool support different languages? i have interest
I am very excite about Unity 4
sorry for english
I am very excite about Unity 4
sorry for english
Although there's no reason why you couldn't train it with common words to do dictation.
One tiny snag where I need to rewrite the FFT algorithm in C#. The asset store only supports licenses for MIT, Creative Commons, and Simplified BSD(BSD3). The Fourier transform that I was using was under LGPL which is too restrictive.
The FFT algorithm has been rewritten and replaced.
This package has been accepted in the Asset Store:
Now that the package has been accepted, I can move forward with better noise removal logic.
I still have plans to implement the Audacity noise removal algorithm which has great noise matching. And I figure that if it can accurately match a noise profile, it might also have better pattern matching.
The accuracy could increase if you push to talk on the voice command. That way the length of the word would be a better match. Right now it's matching on the first few syllables which generally matches all words.
Another demo could be dictation if we train using a bunch of letter combinations. St, Th, Ch, ABC. This will need a wave analysis algorithm in addition to spectrum analysis.
The fingerprinting process, takes the original wave and runs a FFT algorithm. Wave is Amplitude over time. The length of the wave is the frequency. FFT counts each frequency. The math is tricky because a wave is the sum of frequencies, so you get a probability table.
The fingerprint is all the frequencies of the word over the length of the word. Which is not as accurate as if we split up the word into small chunks and compute the frequency of each segment. In that way we can have a more accurate fingerprint.
The fingerprint process happens in Example4.cs as SetProfile. And then the selecting algorithm happens in WordDetection.cs.
WordDetails.cs holds the Wave, and the Spectrum. Although I should add a List<Spectrum> for the profile chunks.
And then FourierTransform.cs I would need to modify to work over the range of the chunks, without making a bunch of array copies.