Signup/Sign In
LAST UPDATED: AUGUST 9, 2021

JavaScript Speech Recognition Example (Speech to Text)

    With the Web Speech API, we can recognize speech using JavaScript. It is super easy to recognize speech in a browser using JavaScript and then getting the text from the speech to use as user input. We have already covered How to convert Text to Speech in Javascript.

    But the support for this API is limited to the Chrome browser only. So if you are viewing this example in some other browser, the live example below might not work.

    Javascript speech recognition - speech to text

    This tutorial will cover a basic example where we will cover speech to text. We will ask the user to speak something and we will use the SpeechRecognition object to convert the speech into text and then display the text on the screen.

    The Web Speech API of Javascript can be used for multiple other use cases. We can provide a list of rules for words or sentences as grammar using the SpeechGrammarList object, which will be used to recognize and validate user input from speech.

    For example, consider that you have a webpage on which you show a Quiz, with a question and 4 available options and the user has to select the correct option. In this, we can set the grammar for speech recognition with only the options for the question, hence whatever the user speaks, if it is not one of the 4 options, it will not be recognized.

    We can use grammar, to define rules for speech recognition, configuring what our app understands and what it doesn't understand.

    JavaScript Speech to Text

    In the code example below, we will use the SpeechRecognition object. We haven't used too many properties and are relying on the default values. We have a simple HTML webpage in the example, where we have a button to initiate the speech recognition.

    The main JavaScript code which is listening to what user speaks and then converting it to text is this:

    // new speech recognition object
    var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;
    var recognition = new SpeechRecognition();
                
    // This runs when the speech recognition service starts
    recognition.onstart = function() {
        console.log("We are listening. Try speaking into the microphone.");
    };
    
    recognition.onspeechend = function() {
        // when user is done speaking
        recognition.stop();
    }
                  
    // This runs when the speech recognition service returns result
    recognition.onresult = function(event) {
        var transcript = event.results[0][0].transcript;
        var confidence = event.results[0][0].confidence;
    };
                  
    // start recognition
    recognition.start();

    In the above code, we have used:

    recognition.start() method is used to start the speech recognition.

    Once we begin speech recognition, the onstart event handler can be used to inform the user that speech recognition has started and they should speak into the mocrophone.

    When the user is done speaking, the onresult event handler will have the result. The SpeechRecognitionEvent results property returns a SpeechRecognitionResultList object. The SpeechRecognitionResultList object contains SpeechRecognitionResult objects. It has a getter so it can be accessed like an array. The first [0] returns the SpeechRecognitionResult at the last position. Each SpeechRecognitionResult object contains SpeechRecognitionAlternative objects that contain individual results. These also have getters so they can be accessed like arrays. The second [0] returns the SpeechRecognitionAlternative at position 0. We then return the transcript property of the SpeechRecognitionAlternative object.

    Same is done for the confidence property to get the accuracy of the result as evaluated by the API.

    We have many event handlers, to handle the events surrounding the speech recognition process. One such event is onspeechend, which we have used in our code to call the stop() method of the SpeechRecognition object to stop the recognition process.

    Now let's see the running code:

    When you will run the code, the browser will ask for permission to use your Microphone, so please click on Allow and then speak anything to see the script in action.

    Conclusion:

    So in this tutorial we learned how we can use Javascript to write our own small application for converting speech into text and then displaying the text output on screen. We also made the whole process more interactive by using the various event handlers available in the SpeechRecognition interface. In future I will try to cover some simple web application ideas using this feature of Javascript to help you usnderstand where we can use this feature.

    If you face any issue running the above script, post in the comment section below. Remember, only Chrome browser supports it.

    You may also like:

    I like writing content about C/C++, DBMS, Java, Docker, general How-tos, Linux, PHP, Java, Go lang, Cloud, and Web development. I have 10 years of diverse experience in software development. Founder @ Studytonight
    IF YOU LIKE IT, THEN SHARE IT
    Advertisement

    RELATED POSTS