
Production-ready, FOSS, not suitable for microcontrollers: Speech-to-Intent is well represented in research, but lacking widely available open-source implementations suitable for microcontrollers. I’d love to hear how this goes, and we’ll keep you updated on our normalization work! Thank you for your detailed feedback, this is extremely is much more of a DSP expert than I am, so I’m sure he can add some interesting thoughts.And this is really all what we need to be able to control said smart washing mashing with voice. If you experiment with this, you should also screen your existing training/test data to remove any samples that fall beneath the noise floor. In the immediate term, I think your idea of adding a noise floor in your Arduino program is a good one. I hope to do some work on this feature some time in the next few months. A solution to this I’d like to implement in Edge Impulse is normalization using learned parameters across the entire dataset, so the normalized spectrogram accurately represents the relative loudness or quietness rather than throwing that information away. I think you’re probably right regarding the normalize function making it difficult to discern between quiet and loud noise. Hi This actually coincides with some thinking we’ve been doing internally regarding normalization. I am going to try adding in even more samples and seeing if that helps.Īny thoughts on other things I can try? I could always weight the confidence by the max volume for an inference sample.

I have tried adding the background noise samples to training data, but it doesn’t seem to improve the model stability. The model still jumps around a lot when it is inside and it is quiet, but it will usually stay under 50% confidence. The model accurately predicts loud sounds.Quiet sounds are all over the place when it comes to predictions.The red line is about the volume level for talking in a normal voice about a foot or two from the board. I graphed the max sample value for each inference period and the resulting prediction confidence. For the second period a group of 2 helicopters flew past around the middle. For the first period there were no helicopters present and it was recorded outside. Is there a clean way to add a noise floor threshold? I could just add something to the Arduino program, where if none of the samples go above a threshold, don’t bother running inference on it because it is too quiet to be discernable. Could this be because the normalize() function is amplifying the silence, which is probably just the noise in the Mic A/D? Maybe this random noise can end up looking like high frequency noise source like a helicopter after feature extraction? I have added a loop to go through the samples in the inference buffer to find the Max/Min, and this only happens when those are low (close to zero). I have tried recording the silence and adding it to the no helicopter class. For the same ambient background noise it will go from 90% no helicopter, to 90% helicopter and every prediction in between. When there isn’t much sound, the predictions start jumping around like crazy. When there is a normal volume noise nearby, it does a great job of determining whether it is a helicopter or not. However, I get very erratic results in the real world. I can train a network that gets about 90% val accuracy and about 85% accuracy in testing. I have tried using both MFCC and Spectrogram feature extractors. Give these a try and let me know how it goes! Of course, the larger MFCC output will require more memory and compute, but you could maybe get away with reducing the overall length of the window if this is a problem.

This might result in a an output that can be more easily distinguished from background noise. So, I would perhaps try and increase the resolution of your MFCC output by reducing the frame length and frame stride.

If the “chop” of the rotor blades happens faster than every 20ms, it may not be distinguishable from a constant background “hum”. This means that each column of the MFCC represents 20ms of sound. The default parameters for the MFCC block have a frame length and frame stride of 0.02 seconds (20ms). Secondly is the MFCC output’s resolution. I’d start by reducing this value and seeing if your results change. If your helicopter is making sounds below this frequency, they will be filtered out. The first think about is the low-frequency cutoff. The challenge here is to make sure our MFCC output represents these periodic variations in a way that is discernible from background noise. For example, the “chop” of a rotor blade may happen every n milliseconds. The sound of a helicopter contains regular periodic variations that the sound of “silence” (i.e. I’m not an audio processing expert, but here are my thoughts.
