Transcribe Speech on your Website

Transcribe Speech on your Website

We all enjoy talking to Google, and requesting a search by just talking to our phone. Ever thought of doing that on your own website? To allow the user to just talk into the forms, rather than typing it out? Yes, it is possible, and quite easy to do! Check it out.

Web Speech API

The Web Speech API helps us with processing speech in the browser. Based on its implementation, the browser itself may use a cloud API, or process it within, but it is free for us, and quite efficient.

This can be useful for filling up large chunk of forms, where the person prefers to talk rather than type so much. Typically, we would have a set of text fields / text areas that are part of a form, and the person would just speak to fill in each, or some of those.

Most modern browsers support the Web Speech API. Of course, you have to forgive the IE. But, it works perfectly well on Chrome. Firefox? Well it works only if you manage to bypass all the security restrictions imposed by the browser.

Code

Excited? Without much ado, lets jump into the code. Let's start with a simple HTML, with an empty text area in it, along with a start button. On clicking the Start button, it invokes the "start()" function in the JavaScript. I am sure, the HTML is trivial for anyone reading this blog. So let us not waste time on that, and jump into the core JavaScript code.

// Instantiate the objects
const SpeechRecognition = window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

// JQuery handles to the textarea
const textbox = $('#textbox');

// The variable that holds the contents of the textarea.
var content = '';

// Configure the speech recognition object
recognition.continuous = true;

// On identifying the text in speech, update the text area contents
recognition.onresult = function(event) {
    console.log(JSON.stringify(event));
    var current = event.resultIndex;
    var transcript = event.results[current][0].transcript;
    content += transcript;
    textbox.val(content);
};

// Add any hooks you like when recognition starts
recognition.onstart = function(e) {
    console.log(JSON.stringify(e));
}

// Add any hooks you like when the speech stops
recognition.onspeechend = function(e) {
    console.log(JSON.stringify(e));
}

// Add any hooks you like when there is an error
recognition.onerror = function(event) {
    console.log(JSON.stringify(error));
    if (event.error == 'no-speech') {
        // No speech recognized
    }
}

// This function triggers the speech recognition
const start = function(e) {
    console.log(JSON.stringify(e));
    if (content.length) {
        content += ' ';
    }
    recognition.start();
};

// The user can manually edit the text in the textarea. 
textbox.on('input', function() {
    content = $(this).val();
})

This code is quite intuitive. It just instantiates an object of the SpeechRecognition API, and configures it to listen to any input from the microphone. As the user speaks into the microphone, the audio is transcribes and then the extracted text is added to the textarea.

Of course, it is not as smart as Alexa, misses some words at times. But is quite handy when filling forms - and other such automations. And of course, it is free! So make sure you use it in your next project!.

Want to see this working? Check it out live on a website that I hosted. Let me know if you like it!

Useful references: