A Move to Offline Voice Recognition?

A Move to Offline Voice Recognition?

I’ve spent a lot of time over the last year or so with Google’s AIY Projects Voice Kit, including some time investigating how well TensorFlow ran locally on the Raspberry Pi attempting to use models based around the initial data release of Google’s Open Speech Recording to customise the offline “wake word” for my voice-controlled Magic Mirror.

https://medium.com/media/0aadd875087ca8b960a942796851aa08/href

Back at the start of last year this was a hard thing to do, it was really pushing the Raspberry Pi to its limits. However as machine learning software, such as TensorFlow Lite and other tools, have matured we’ve seen models being run successfully on much more minimal hardware.

https://medium.com/media/3d4253698a18b1311682ccd1be81491a/href

With the privacy concerns raised by cloud connected voice devices, as well as the sometime inconvenient need for a network connection, it’s inevitable that we’ll start to see more offline devices.

While we’ve seen a number of “wake word” engines—a piece of code and a trained network that monitors for the special word like “Alexa” or “OK Google” that activates your voice assistant —these, like pretty much all modern voice recognition engines, need training data and the availability of that sort of data has really held smaller players.

Realistically most people won’t be able to gather enough audio samples to train a network for a custom wake word. The success of machine learning has relied heavily on the corpus of training data that companies — like Google — have managed to build up. For the most part these training datasets are the secret sauce, and closely held by the companies, and people, that have them. Although there are a number of open sourced collections of visual data to train object recognition algorithms, there are far fewer available speech data. Amongst one of the few available is the Open Speech Recording project from Google, and while they’ve made an initial dataset release, it’s still fairly limited.

In practice it’s never going to be feasible for most people to build the required large datasets, and while people are investigating transfer learning it’s generally regarded as not being quite ready.


A Move to Offline Voice Recognition? was originally published in Hackster Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.