In my previous post, I have written about a web application that uses Amazon Polly to notify
users with a natural speech when their cards are sold.
In this post, I will show how to use Amazon Lex for
creating a web application that enables users to use their native speech to command
the application.
Amazon Lex
provides conversational text and voice interfaces powered by the same deep
learning technologies as Amazon Alexa. Using Lex, we can create bots and applications that understands text and
speech commands.
I will use the application that is developed in my previous post as a starting point, which can be found in my GitHub repository.
The sample application will use an Amazon Lex bot to
allow users to execute commands like add a card, show my cards, sell a card and
log out. The application will record audio in the browser and send the audio to
Lex for processing. The Lex bot will process the audio input and respond with
an audio message and the audio will be played in the browser. When a command is
understood, the application will execute the command in the web app. The flow
is shown in the picture below.
The steps to develop the application are below.
1. Create the Amazon Lex bot
2. Test the bot
3. Create the Amazon Lex client
4. Create the SpeechController
5. Change the dashboard to interact with audio commands
Let's start.
1. Create
the Amazon Lex Bot
To create the bot, login to the Amazon Management Console and select Lex. Please note that Lex is currently only
available in N. Virginia region.
For more information on creating a custom Lex bot, see Exercise 2: Create a Custom Amazon Lex Bot on Lex documentation.
Click Get Started
to start. Click Custom Bot and enter CardStore as Bot Name. Select Output
voice that you want. I will use Joanna for this post. Enter Session
timeout. Select No for COPPA section and
click Create to create your bot.
Your bot will be created like the picture below without
no intents. Intents will be used to execute different commands in our application.
Click Create Intent
to create first intent. Our first intent will be LogOut intent. This intent will be used to log out from the web application.
In the popup window click Create new intent and enter LogOut as intent name.
Click Add and LogOut intent will be created like below.
Enter "I want
to log out" in the Sample
utterances section and click + icon. Repeat the step for "I would like to log out". In the Confirmation prompt section click Confirmation prompt and enter "Are you sure you want to log out?"
in the Confirm text box and "Okay.
You are staying" in the Cancel text box.
Click Save Intent to
save LogOut intent.
Click + icon near the Intents section to add AddCard
intent. Enter "I want to add a card"
and "I would like to add a card"
utterances and click Save Intent to
save AddCard intent.
Repeat the same steps for ShowMyCards intent and enter "I want to see my cards" and "I would like to see my cards" for Sample utterances.
The last intent we will add is SellACard intent. This
intent will be used for selling a card. Selling a card requires two values to
be entered: card name and card price. In Lex, we use slots to represent values
that are entered by user. For card price slot we will use built-in
AMAZON.NUMBER slot type but for card name we will use a custom slot type.
To create CardName custom slot type, click + icon right
to Slot types. Enter CardName as Slot type name and "Name of the
card" as Description and enter some values like below and click Save slot type.
Create the SellACard intent and enter "I want to sell a card" and "I would like to sell a card" for Sample utterances.
In the Slots section enter CardName as Name and select CardName from Slot type combo. Enter "Which card?" as Prompt and click +
icon to add CardName slot.
To create CardPrice slot, select Required check box, enter
CardPrice as Name and select
AMAZON.NUMBER from Slot type combo. Enter "At what price?" as Prompt and click + icon.
We should add a confirmation prompt to prevent
accidentally selling a card for the SellACard intent. In the Confirmation
prompt section, click Confirmation prompt and enter "Are you sure you want to sell your card, '{CardName}', for {CardPrice}$
?" at the Confirm text box and "Okay. Your card will stay with you" at the Cancel text box.
Click Save Intent to save SellACard intent.
Now, our bot CardStore is ready to be build and to test.
Click Build at the top right to build
the bot. It will take some time to build the bot. After the bot is built you
can test the bot.
2. Test
the Bot
To test the bot, click blue Test Bot button at the bottom
right. You can test the bot by entering commands to chat window and also by
your speech after clicking the mic icon. If the bot is not understands you, you
can refresh the browser page and try again.
More than one intent can be processed by one bot. Lex
uses sample utterances to decide intents. After the intent is determined, Lex
tries to get slots one by one, if there is any. After all the slots are
received, Lex asks for confirmation prompt and if user says 'Yes', the intent
is ready to be fullfilled and values of the slots are written like the picture
below.
Normally, the client is expected to fulfill the intent
with the specified values of the slots. Lex allows a Lambda function to be
executed for fullfilment, but for simplicity in this post I won't use Lambda
functions.
Also Lex allows a specific Lambda function to be used for
slot validation. This can be very useful for validating complex slot values.
3.
Create the Amazon Lex client
Add Maven dependency for Amazon Lex SDK.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-lex</artifactId>
<version>1.11.119</version>
</dependency>
Please note that, to use 1.11.119 version of the AWS Java
SDK for Lex, please also set the same version for the other AWS Java SDK
dependencies like DynamoDB, Polly, etc.
Add com.cardstore.lex.LexClient
like below. This client will be used to access the Lex
bot. post method
will be used to post the audio input recorded to the Lex bot. LexPostResult class
will hold the conversation data that the Lex bot returns.
package com.cardstore.lex;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import org.springframework.util.StreamUtils;
import com.amazonaws.regions.Region;
import
com.amazonaws.services.lexruntime.AmazonLexRuntime;
import com.amazonaws.services.lexruntime.AmazonLexRuntimeClient;
import
com.amazonaws.services.lexruntime.AmazonLexRuntimeClientBuilder;
import
com.amazonaws.services.lexruntime.model.PostContentRequest;
import
com.amazonaws.services.lexruntime.model.PostContentResult;
public class LexClient {
private final AmazonLexRuntime lex;
private final String botName;
private final String botAlias;
private final String username;
public LexClient(Region region, String botName, String botAlias, String username) {
this.botName = botName;
this.botAlias = botAlias;
this.username = username;
AmazonLexRuntimeClientBuilder builder = AmazonLexRuntimeClient.builder();
builder.setRegion(region.getName());
lex = builder.build();
}
public LexPostResult post(String contentType, byte[] audio, String accept) throws Exception {
PostContentRequest req = new PostContentRequest();
req.setBotName(botName);
req.setBotAlias(botAlias);
req.setUserId(username);
req.setAccept(accept);
req.setContentType(contentType);
InputStream is = new ByteArrayInputStream(audio);
req.setInputStream(is);
PostContentResult ret = lex.postContent(req);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamUtils.copy(ret.getAudioStream(), baos);
LexPostResult res = new LexPostResult(ret.getIntentName(), ret.getDialogState(), ret.getSlots(), ret.getSlotToElicit(), ret.getSessionAttributes(), baos.toByteArray(), ret.getInputTranscript(), ret.getMessage());
return res;
}
}
The code for LexPostResult
class is below. This class will hold conversation data
like the intent name that is determined, dialog state, next slot to elicit,
determined slot values and a message in text and audio format.
package com.cardstore.lex;
import java.util.HashMap;
import java.util.Map;
import
com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
public class LexPostResult {
private String intentName;
private String dialogState;
private Map<String, String> slots;
private String slotToElicit;
private Map<String, String> sessionAttributes;
private byte[] audio;
private String inputText;
private String responseText;
public LexPostResult(String intentName, String dialogState, String slots, String slotToElicit,
String sessionAttributes, byte[] audio, String inputText, String responseText) throws Exception {
super();
this.intentName = intentName;
this.dialogState = dialogState;
this.slots = stringToMap(slots);
this.slotToElicit = slotToElicit;
this.sessionAttributes = stringToMap(sessionAttributes);
this.audio = audio;
this.inputText = inputText;
this.responseText = responseText;
}
private Map<String, String> stringToMap(String jsonStr) throws Exception {
Map<String, String> map = new HashMap<String, String>();
if (jsonStr != null) {
ObjectMapper mapper = new ObjectMapper();
// convert JSON string to Map
map = mapper.readValue(jsonStr, new TypeReference<Map<String, String>>(){});
}
return map;
}
public String getIntentName() {
return intentName;
}
public String getDialogState() {
return dialogState;
}
public Map<String, String> getSlots() {
return slots;
}
public String getSlotToElicit() {
return slotToElicit;
}
public Map<String, String> getSessionAttributes() {
return sessionAttributes;
}
public byte[] getAudio() {
return audio;
}
public String getInputText() {
return inputText;
}
public String getResponseText() {
return responseText;
}
}
4.
Create the SpeechController
After creating the Lex client, we can use the client to
send the audio recorded in the browser to the Lex bot. The code for SpeechController class
is below.
The PCM audio format will be used with sample rate of
16K. The result of the Lex postContent request will be in audio/mpeg
format that will be played at the browser.
createSpeechResultFromLexPostResult
method will map the returned intent names to the commands that will be used in
the browser like ADD_CARD, SHOW_MY_CARDS, SELL_CARD, LOGOUT .
initLexClient method
will be used to init Lex client when a user is logged in and a session is
created. The same Lex client will be used during the whole session. After the
response audio is returned, it will be put in the session and when the browser
requests the audio it will be returned from the session.
package com.cardstore.controller;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import javax.servlet.http.HttpSession;
import org.springframework.stereotype.Controller;
import org.springframework.util.StreamUtils;
import
org.springframework.web.bind.annotation.RequestMapping;
import
org.springframework.web.bind.annotation.ResponseBody;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.cardstore.entity.User;
import com.cardstore.lex.LexClient;
import com.cardstore.lex.LexPostResult;
@Controller
public class SpeechController {
private static final String CONTENT_TYPE = "audio/x-l16; sample-rate=16000;
channel-count=1";
private static final String ACCEPT = "audio/mpeg";
private static final String USER_LEX_CLIENT_KEY = "USER_LEX_CLIENT_KEY";
private static final String USER_SPEECH_RESPONSE_AUDIO_KEY = "USER_SPEECH_RESPONSE_AUDIO_KEY";
@RequestMapping(path="/speech")
@ResponseBody
public SpeechResult speechCommand(InputStream requestBodyStream, HttpSession session) throws Exception {
SpeechResult res = null;
User user = UserController.userfromSession(session);
if (user != null) {
LexClient client = getLextClientFromSession(session);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamUtils.copy(requestBodyStream, baos);
byte[] audio = baos.toByteArray();
LexPostResult ret = client.post(CONTENT_TYPE, audio, ACCEPT);
res = createSpeechResultFromLexPostResult(ret);
putSpeechResponseAudioIntoSession(session, ret.getAudio());
}
return res;
}
private SpeechResult createSpeechResultFromLexPostResult(LexPostResult ret) {
SpeechResult res = new SpeechResult();
res.setCommand(SpeechResult.SPEECH_COMMAND.UNKNOWN);
res.setInputText(ret.getInputText());
res.setResponseText(ret.getResponseText());
if (ret.getDialogState().equals("ReadyForFulfillment")) {
if (ret.getIntentName().equals("LogOut"))
res.setCommand(SpeechResult.SPEECH_COMMAND.LOGOUT);
else if (ret.getIntentName().equals("AddCard")) {
res.setCommand(SpeechResult.SPEECH_COMMAND.ADD_CARD);
}
else if (ret.getIntentName().equals("ShowMyCards")) {
res.setCommand(SpeechResult.SPEECH_COMMAND.SHOW_MY_CARDS);
}
else if (ret.getIntentName().equals("SellACard")) {
res.setCommand(SpeechResult.SPEECH_COMMAND.SELL_CARD);
res.setCardName(ret.getSlots().get("CardName"));
res.setCardPrice(ret.getSlots().get("CardPrice"));
}
}
return res;
}
@RequestMapping(path="/speechResponseAudio", produces="audio/mpeg3")
public @ResponseBody byte[] speechResponseAudio(HttpSession session) throws IOException {
return getSpeechResponseAudioFromSession(session);
}
public static LexClient
getLextClientFromSession(HttpSession session) {
return (LexClient)session.getAttribute(USER_LEX_CLIENT_KEY);
}
public static void putLexClientIntoSession(HttpSession session, LexClient client) {
session.setAttribute(USER_LEX_CLIENT_KEY, client);
}
public static byte[]
getSpeechResponseAudioFromSession(HttpSession session) {
return (byte[])session.getAttribute(USER_SPEECH_RESPONSE_AUDIO_KEY);
}
public static void
putSpeechResponseAudioIntoSession(HttpSession session, byte[] audio) {
session.setAttribute(USER_SPEECH_RESPONSE_AUDIO_KEY, audio);
}
public static void initLexClient(HttpSession session, String sessionId) {
LexClient client = new LexClient(Region.getRegion(Regions.US_EAST_1), "CardStore", "$LATEST", sessionId);
putLexClientIntoSession(session, client);
}
}
The code for SpeechResult
class is below.
package com.cardstore.controller;
public class SpeechResult {
public enum SPEECH_COMMAND { UNKNOWN, ADD_CARD, SHOW_MY_CARDS, SELL_CARD, LOGOUT };
SpeechResult.SPEECH_COMMAND command;
String cardName;
String cardPrice;
String inputText;
String responseText;
public SpeechResult.SPEECH_COMMAND getCommand() {
return command;
}
public void
setCommand(SpeechResult.SPEECH_COMMAND command) {
this.command = command;
}
public String getCardName() {
return cardName;
}
public void setCardName(String cardName) {
this.cardName = cardName;
}
public String getCardPrice() {
return cardPrice;
}
public void setCardPrice(String cardPrice) {
this.cardPrice = cardPrice;
}
public String getInputText() {
return inputText;
}
public void setInputText(String inputText) {
this.inputText = inputText;
}
public String getResponseText() {
return responseText;
}
public void setResponseText(String responseText) {
this.responseText = responseText;
}
}
Add the highlighted line below to login method of
UserController class to initialize Lex client when a used is logged in.
@RequestMapping(value = "/login", method = RequestMethod.POST, produces = "text/plain")
@ResponseBody
public String login(@RequestBody User user, HttpServletRequest request) {
String error = "None";
User existing = userRepository.findOne(user.getUsername());
boolean canLogin = existing != null && existing.getPassword().equals(user.getPassword());
if (!canLogin)
error = "User name and password mismatch.";
else if (!existing.getActivationStatus().equals(User.ACTIVATION_STATUS_DONE))
error = "User is not activated.";
else {
HttpSession session = request.getSession(true);
session.setAttribute(USER_KEY_FOR_SESSION, existing);
SpeechController.initLexClient(session, session.getId());
}
return error;
}
Now, the bot and the server side code is ready to be used
from client side. We can use the server code from the web app.
5.
Change the dashboard to interact with audio commands
In the dashboard, we will create a chat window just like
the Lex Test Bot chat window. Conversations will be shown here.
Then, we will implement audio recording. We will record
the audio using the MediaRecorder of the Web Audio API.
After the audio is recorded for 4 seconds, first we will down sample the
recorded audio to 16 kHz. Then we will remove the silence from recorded audio. If
there is no non-silence audio left, we won't send it to Lex. The last thing we
will do before sending the audio is converting from float audio data to 16 bit
integer audio data. After the conversion data is sent to Lex.
The server code will send the audio that is recorded in
the browser to Amazon Lex. Lex will process the audio input and will return a
prompt in the audio form. The audio will be put in session and will be request
by the browser to play. When an intent is ready for fulfillment, the server code
will decide the command to invoke in the browser. The browser will execute the
requested command and a notification message will be played by generating a
speech with Amazon Polly.
We will start with replacing the code below in the
original dahsboard.jsp
function initNotifications() {
if (typeof (EventSource) !== "undefined") {
var source = new EventSource("/feed");
source.addEventListener('cardSold', function(event) {
var data = JSON.parse(event.data);
processCardSoldEvent(data);
});
}
}
</script>
</head>
<body onload="initNotifications()">
<div id="notif-container">
with the code below.
var speechRecorder = {};
function playAudioFromUrl(url, finishHandler) {
setSpeechStatus('Speaking...');
var audio = new Audio(url);
audio.onended = function() {
if (finishHandler)
finishHandler();
}
audio.play();
}
function stopRecording() {
speechRecorder.recorder.stop();
}
function startRecording() {
setSpeechStatus('Listening...');
speechRecorder.recorder.start();
setTimeout(stopRecording, 4000);
}
function handleLexResponse(speechRes) {
if (speechRes.command == 'LOGOUT') {
playChatResponse('Okay. You are logging out, good bye.', function() {
logout();
});
return;
}
replaceChatAudioInputLine(speechRes.inputText);
if (speechRes.command == 'UNKNOWN') {
addChatBotResponse(speechRes.responseText);
playAudioFromUrl('speechResponseAudio', startRecording);
}
else {
if (speechRes.command == 'ADD_CARD') {
playChatResponse('Okay. You can add your card using this form.', function() {
addCardClicked();
startRecording();
});
}
else if (speechRes.command == 'SHOW_MY_CARDS') {
playChatResponse('Okay. Here are your cards.', function() {
listMyCards();
startRecording();
});
}
else if (speechRes.command == 'SELL_CARD')
sellCard(speechRes.cardName, speechRes.cardPrice, function (resultMessage) {
playChatResponse(resultMessage, startRecording);
});
}
}
function sendAudioToLex(audioData) {
setSpeechStatus('Analyzing...');
addChatAudioInputLine();
$.ajax({
type: 'POST',
url: 'speech',
data: audioData,
contentType: false,
cache: false,
processData: false,
success: handleLexResponse,
error: function () {
alert("Can't send audio.");
startRecording();
}
});
}
function reSample(audioBuffer, targetSampleRate, onComplete) {
var channel = audioBuffer.numberOfChannels;
var samples = audioBuffer.length * targetSampleRate / audioBuffer.sampleRate;
var offlineContext = new OfflineAudioContext(channel, samples, targetSampleRate);
var bufferSource = offlineContext.createBufferSource();
bufferSource.buffer = audioBuffer;
bufferSource.connect(offlineContext.destination);
bufferSource.start(0);
offlineContext.startRendering().then(function(renderedBuffer){
onComplete(renderedBuffer);
})
}
var SILENCE_THRESHOLD = 0.04;
function removeSilence(buffer) {
var l = buffer.length;
var nonSilenceStart = 0;
var nonSilenceEnd = l;
while (nonSilenceStart < l) {
if (Math.abs(buffer[nonSilenceStart]) > SILENCE_THRESHOLD)
break;
nonSilenceStart++;
}
while (nonSilenceEnd > nonSilenceStart) {
if (Math.abs(buffer[nonSilenceEnd]) > SILENCE_THRESHOLD)
break;
nonSilenceEnd--;
}
var retBuffer = buffer;
if (nonSilenceStart != 0 || nonSilenceEnd != l) {
retBuffer = buffer.subarray(nonSilenceStart, nonSilenceEnd);
}
return retBuffer;
}
function convertFloat32ToInt16(buffer) {
buffer = removeSilence(buffer);
var l = buffer.length;
var buf = new Int16Array(l);
while (l--) {
buf[l] = Math.min(1, buffer[l]) * 0x7FFF;
}
return buf.buffer;
}
function initSpeechRecording() {
navigator.mediaDevices.getUserMedia({
audio: true
}).then(
function onSuccess(stream) {
var data = [];
speechRecorder.recorder = new MediaRecorder(stream);
speechRecorder.audioContext = new AudioContext();
speechRecorder.recorder.ondataavailable = function(e) {
data.push(e.data);
};
speechRecorder.recorder.onerror = function(e) {
throw e.error || new Error(e.name);
}
speechRecorder.recorder.onstart = function(e) {
data = [];
}
speechRecorder.recorder.onstop = function(e) {
setSpeechStatus('Checking silence...');
var blobData = new Blob(data, {type: 'audio/x-l16'});
var reader = new FileReader();
reader.onload = function() {
speechRecorder.audioContext.decodeAudioData(reader.result, function(buffer) {
reSample(buffer, 16000, function(newBuffer) {
var trimmedBuffer = removeSilence(newBuffer.getChannelData(0));
if (trimmedBuffer.length > 0) // if its not fully silence, send to Lex
sendAudioToLex(convertFloat32ToInt16(trimmedBuffer));
else
startRecording();
});
});
};
reader.readAsArrayBuffer(blobData);
}
});
}
var lastAudioInputId = 0;
function addChatAudioInputLine() {
var row$ = $('<p id="audioInput' + ++lastAudioInputId + '" class="me">Audio input</p>');
$('#chat').append(row$);
$("#chat").scrollTop($("#chat")[0].scrollHeight);
}
function replaceChatAudioInputLine(txt) {
$('#audioInput' + lastAudioInputId).html(txt);
}
function addChatBotResponse(txt) {
var row$ = $('<p class="bot">' + (txt || ' ') + '</p>');
$('#chat').append(row$);
$("#chat").scrollTop($("#chat")[0].scrollHeight);
}
function playChatResponse(txt, callback) {
addChatBotResponse(txt);
playAudioFromUrl('audio?msg=' + txt, callback);
}
function setSpeechStatus(txt) {
$('#speechStatus').html(txt);
}
function initPage() {
initNotifications();
initSpeechRecording();
playChatResponse('Welcome ${user.name}. Your current balance is ${user.balance}$. What would you like to do ?', startRecording);
}
</script>
function playAudioFromUrl(url, finishHandler) {
setSpeechStatus('Speaking...');
var audio = new Audio(url);
audio.onended = function() {
if (finishHandler)
finishHandler();
}
audio.play();
}
function stopRecording() {
speechRecorder.recorder.stop();
}
function startRecording() {
setSpeechStatus('Listening...');
speechRecorder.recorder.start();
setTimeout(stopRecording, 4000);
}
function handleLexResponse(speechRes) {
if (speechRes.command == 'LOGOUT') {
playChatResponse('Okay. You are logging out, good bye.', function() {
logout();
});
return;
}
replaceChatAudioInputLine(speechRes.inputText);
if (speechRes.command == 'UNKNOWN') {
addChatBotResponse(speechRes.responseText);
playAudioFromUrl('speechResponseAudio', startRecording);
}
else {
if (speechRes.command == 'ADD_CARD') {
playChatResponse('Okay. You can add your card using this form.', function() {
addCardClicked();
startRecording();
});
}
else if (speechRes.command == 'SHOW_MY_CARDS') {
playChatResponse('Okay. Here are your cards.', function() {
listMyCards();
startRecording();
});
}
else if (speechRes.command == 'SELL_CARD')
sellCard(speechRes.cardName, speechRes.cardPrice, function (resultMessage) {
playChatResponse(resultMessage, startRecording);
});
}
}
function sendAudioToLex(audioData) {
setSpeechStatus('Analyzing...');
addChatAudioInputLine();
$.ajax({
type: 'POST',
url: 'speech',
data: audioData,
contentType: false,
cache: false,
processData: false,
success: handleLexResponse,
error: function () {
alert("Can't send audio.");
startRecording();
}
});
}
function reSample(audioBuffer, targetSampleRate, onComplete) {
var channel = audioBuffer.numberOfChannels;
var samples = audioBuffer.length * targetSampleRate / audioBuffer.sampleRate;
var offlineContext = new OfflineAudioContext(channel, samples, targetSampleRate);
var bufferSource = offlineContext.createBufferSource();
bufferSource.buffer = audioBuffer;
bufferSource.connect(offlineContext.destination);
bufferSource.start(0);
offlineContext.startRendering().then(function(renderedBuffer){
onComplete(renderedBuffer);
})
}
var SILENCE_THRESHOLD = 0.04;
function removeSilence(buffer) {
var l = buffer.length;
var nonSilenceStart = 0;
var nonSilenceEnd = l;
while (nonSilenceStart < l) {
if (Math.abs(buffer[nonSilenceStart]) > SILENCE_THRESHOLD)
break;
nonSilenceStart++;
}
while (nonSilenceEnd > nonSilenceStart) {
if (Math.abs(buffer[nonSilenceEnd]) > SILENCE_THRESHOLD)
break;
nonSilenceEnd--;
}
var retBuffer = buffer;
if (nonSilenceStart != 0 || nonSilenceEnd != l) {
retBuffer = buffer.subarray(nonSilenceStart, nonSilenceEnd);
}
return retBuffer;
}
function convertFloat32ToInt16(buffer) {
buffer = removeSilence(buffer);
var l = buffer.length;
var buf = new Int16Array(l);
while (l--) {
buf[l] = Math.min(1, buffer[l]) * 0x7FFF;
}
return buf.buffer;
}
function initSpeechRecording() {
navigator.mediaDevices.getUserMedia({
audio: true
}).then(
function onSuccess(stream) {
var data = [];
speechRecorder.recorder = new MediaRecorder(stream);
speechRecorder.audioContext = new AudioContext();
speechRecorder.recorder.ondataavailable = function(e) {
data.push(e.data);
};
speechRecorder.recorder.onerror = function(e) {
throw e.error || new Error(e.name);
}
speechRecorder.recorder.onstart = function(e) {
data = [];
}
speechRecorder.recorder.onstop = function(e) {
setSpeechStatus('Checking silence...');
var blobData = new Blob(data, {type: 'audio/x-l16'});
var reader = new FileReader();
reader.onload = function() {
speechRecorder.audioContext.decodeAudioData(reader.result, function(buffer) {
reSample(buffer, 16000, function(newBuffer) {
var trimmedBuffer = removeSilence(newBuffer.getChannelData(0));
if (trimmedBuffer.length > 0) // if its not fully silence, send to Lex
sendAudioToLex(convertFloat32ToInt16(trimmedBuffer));
else
startRecording();
});
});
};
reader.readAsArrayBuffer(blobData);
}
});
}
var lastAudioInputId = 0;
function addChatAudioInputLine() {
var row$ = $('<p id="audioInput' + ++lastAudioInputId + '" class="me">Audio input</p>');
$('#chat').append(row$);
$("#chat").scrollTop($("#chat")[0].scrollHeight);
}
function replaceChatAudioInputLine(txt) {
$('#audioInput' + lastAudioInputId).html(txt);
}
function addChatBotResponse(txt) {
var row$ = $('<p class="bot">' + (txt || ' ') + '</p>');
$('#chat').append(row$);
$("#chat").scrollTop($("#chat")[0].scrollHeight);
}
function playChatResponse(txt, callback) {
addChatBotResponse(txt);
playAudioFromUrl('audio?msg=' + txt, callback);
}
function setSpeechStatus(txt) {
$('#speechStatus').html(txt);
}
function initPage() {
initNotifications();
initSpeechRecording();
playChatResponse('Welcome ${user.name}. Your current balance is ${user.balance}$. What would you like to do ?', startRecording);
}
</script>
</head>
<body onload="initPage()">
<div class="chatContainer">
<div id="speechStatus"></div>
<div id="chat" class="chat"></div>
</div>
There are also some another changes related to CSS
styles, but they are not essential. For all changes please compare the old and
new code.
Update:
There are a few required JVM properties to run the application. You should specify the required properties in /src/main/resources/application.properties file like below.
user.activation.queue.name=
mail.from.address=
user.card.upload.s3.bucket.name=
user.card.upload.s3.bucket.region=
user.card.upload.s3.bucket.awsId=
user.card.upload.s3.bucket.awsSecret=
Update:
There are a few required JVM properties to run the application. You should specify the required properties in /src/main/resources/application.properties file like below.
user.activation.queue.name=
mail.from.address=
user.card.upload.s3.bucket.name=
user.card.upload.s3.bucket.region=
user.card.upload.s3.bucket.awsId=
user.card.upload.s3.bucket.awsSecret=
After we complete all the changes, our application should
be ready to try. After you logged in, the welcome message should be played and
the application should start to listen your commands. There is a video below
showing the usage of the application.
Next
Steps
In this post, I have fulfilled the intents in the web
app. In real applications, Lambda functions can be used for slot validation and
fulfillment.
Also for simplicity, audio is recorded for 4 seconds and
if all the recorded data is silence, the audio is not sent to Lex. In real
applications, we can create a
ScriptProcessorNode
to
analyze the recorded audio data in real-time and stop the recording if the
silence is detected for a specific duration. For more information, see Web Audio API docs.
For more information on using Lex within a web app, see
this Amazon blog post and
you can find the code here.
Summary
In this post, I have developed an Amazon Lex bot to be
used from a web application and used that bot to control the web app functions.
To read more about AWS AI services, stay tuned.