With the Web Speech API, we can recognize speech using JavaScript. It is super easy to recognize speech in a browser using JavaScript and then getting the text from the speech to use as user input. We have already covered How to convert Text to Speech in Javascript.
But the support for this API is limited to the Chrome browser only. So if you are viewing this example in some other browser, the live example below might not work.
This tutorial will cover a basic example where we will cover speech to text. We will ask the user to speak something and we will use the SpeechRecognition
object to convert the speech into text and then display the text on the screen.
The Web Speech API of Javascript can be used for multiple other use cases. We can provide a list of rules for words or sentences as grammar using the SpeechGrammarList
object, which will be used to recognize and validate user input from speech.
For example, consider that you have a webpage on which you show a Quiz, with a question and 4 available options and the user has to select the correct option. In this, we can set the grammar for speech recognition with only the options for the question, hence whatever the user speaks, if it is not one of the 4 options, it will not be recognized.
We can use grammar, to define rules for speech recognition, configuring what our app understands and what it doesn't understand.
JavaScript Speech to Text
In the code example below, we will use the SpeechRecognition
object. We haven't used too many properties and are relying on the default values. We have a simple HTML webpage in the example, where we have a button to initiate the speech recognition.
The main JavaScript code which is listening to what user speaks and then converting it to text is this:
// new speech recognition object
var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;
var recognition = new SpeechRecognition();
// This runs when the speech recognition service starts
recognition.onstart = function() {
console.log("We are listening. Try speaking into the microphone.");
};
recognition.onspeechend = function() {
// when user is done speaking
recognition.stop();
}
// This runs when the speech recognition service returns result
recognition.onresult = function(event) {
var transcript = event.results[0][0].transcript;
var confidence = event.results[0][0].confidence;
};
// start recognition
recognition.start();
In the above code, we have used:
recognition.start()
method is used to start the speech recognition.
Once we begin speech recognition, the onstart
event handler can be used to inform the user that speech recognition has started and they should speak into the mocrophone.
When the user is done speaking, the onresult event handler will have the result. The SpeechRecognitionEvent
results property returns a SpeechRecognitionResultList
object. The SpeechRecognitionResultList
object contains SpeechRecognitionResult
objects. It has a getter so it can be accessed like an array. The first [0] returns the SpeechRecognitionResult
at the last position. Each SpeechRecognitionResult
object contains SpeechRecognitionAlternative
objects that contain individual results. These also have getters so they can be accessed like arrays. The second [0] returns the SpeechRecognitionAlternative
at position 0. We then return the transcript
property of the SpeechRecognitionAlternative
object.
Same is done for the confidence
property to get the accuracy of the result as evaluated by the API.
We have many event handlers, to handle the events surrounding the speech recognition process. One such event is onspeechend
, which we have used in our code to call the stop()
method of the SpeechRecognition
object to stop the recognition process.
Now let's see the running code:
When you will run the code, the browser will ask for permission to use your Microphone, so please click on Allow and then speak anything to see the script in action.
Conclusion:
So in this tutorial we learned how we can use Javascript to write our own small application for converting speech into text and then displaying the text output on screen. We also made the whole process more interactive by using the various event handlers available in the SpeechRecognition
interface. In future I will try to cover some simple web application ideas using this feature of Javascript to help you usnderstand where we can use this feature.
If you face any issue running the above script, post in the comment section below. Remember, only Chrome browser supports it.
You may also like: