Tuesday, May 23, 2017

Look Who's Talking!. Creating a Web Application That Talks with Amazon Polly


Web and mobile applications combined with smart devices like Amazon Echo and Google Home can provide very interesting interactions today. Recently, Google Home announced an amazing feature that you can listen the specific steps of your favorite meal recipe while cooking. You can view the video on YouTube. For more information, see Google Home Help.
For a few months, I have been writing about various AWS services in my blog.  I have created a digital card store web application for managing digital game cards and added different functionalities by using various AWS services. We can add a card with a card image, put the card on sale and buy a card in this simple application.
After Google Home announcement, I was thinking whether it was possible to notify the owners of the cards with a natural speech when their cards are sold. With Amazon Polly, it sure is.
Amazon Polly, is a service that turns text into speech. With Polly, we can create applications that talk in a natural human like voice. Polly supports 47 different voices across 24 languages and Turkish is one of them :). You can try Filiz, a Turkish female voice, here.
In this post, I will use Amazon Polly to generate a speech based notification that notifies the users when their cards are bought by other users. I will use Server Sent Events for providing real time notification and play an audio file generated by Amazon Polly. The picture below shows the notification flow.

For this post, I will use English voice. You can listen a sample speech here, generated for the text below.
Your card, Gandalf, has been bought by John Doe, for 45$

I will use the code I have developed in this post as a starting point, which can be found in my GitHub repository.
Also you can find a comparison of Server Sent Events with Long Polling, WebSockets and Comet here.

The steps are below.

1. Enable Server Sent Events
2. Send notification when cards are bought
3. Enable Polly
4. Generate audio with Amazon Polly
5. Change the dashboard to play audio notifications

Let's start.

1. Enable Server Sent Events
I will use Spring's SseEmitter class to implement Server Sent Events. Add sseEmitters field to the UserController class like the below to hold a map from user name to the notification channel for that user.
@Controller
public class UserController {

       public static final String USER_KEY_FOR_SESSION = "USER";

       private static Map<String, SseEmitter> sseEmitters = new HashMap<String, SseEmitter>();


Add the support methods like the below for creating a new notification channel for a user, getting the channel for a user, removing the channel for a user and sending notification to the user.
private synchronized SseEmitter newEmitterForUser(String username) {
            
       SseEmitter emitter = new SseEmitter();
      
       Runnable remover = new Runnable() {
             @Override
             public void run() {
                    removeEmitter(username);
             }
       };
      
       emitter.onCompletion(remover);
       emitter.onTimeout(remover);

       sseEmitters.put(username, emitter);

       return emitter;
}

private synchronized SseEmitter getEmitterForUser(String username) {
       return sseEmitters.get(username);
}
      
synchronized void removeEmitter(String username) {
       sseEmitters.remove(username);
}

void notifyUser(String username, String eventName, Object data) {
       SseEmitter emitter = getEmitterForUser(username);
      
       if (emitter != null)
             try {
                    emitter.send(SseEmitter.event().name(eventName).data(data));
             } catch (Exception e) {
                    e.printStackTrace();
             }
}

Add feed method like the below for establishing the notification channel for the user when a user is logged in. This method will be called from dashboard.jsp and it will create a channel for sending notifications from the server to the browser.
       @RequestMapping("/feed")
       public ResponseBodyEmitter feed(HttpSession session) {
             SseEmitter emitter = null;
             User user = userfromSession(session);
            
             if (user != null) {
                    emitter = newEmitterForUser(user.getUsername());
             }
             return emitter;
       }

2. Send notification when cards are bought
Add CardSoldEvent class to the com.cardstore.entity package with the fields below. This class will be used to hold data about a card sold event.
public class CardSoldEvent {
       private String name;
       private double price;
       private String oldOwner;
       private double oldOwnerBalance;
       private String newOwner;
      
       public CardSoldEvent(String name, double price, String oldOwner, double oldOwnerBalance, String newOwner) {
             super();
             this.name = name;
             this.price = price;
             this.oldOwner = oldOwner;
             this.oldOwnerBalance = oldOwnerBalance;
             this.newOwner = newOwner;
       }

Add userController field to CardController class like the below.
@RestController
public class CardController {

       @Autowired
       CardRepository cardRepository;

       @Autowired
       UserRepository userRepository;
      
       @Autowired
       UserController userController;
And add the lines below to the buyCard method to send a notification to the seller of the card. The notification is sent as an event with the name cardSold.
CardSoldEvent event = new CardSoldEvent(cardToBuy.getName(), cardToBuy.getPrice(), seller.getName(), seller.getBalance(), currentUser.getName());

userController.notifyUser(seller.getUsername(), "cardSold", event);

3. Enable Polly
Add Maven dependency for Amazon Polly SDK.
              <dependency>
                    <groupId>com.amazonaws</groupId>
                    <artifactId>aws-java-sdk-polly</artifactId>
                    <version>1.11.62</version>
             </dependency>

Add PollyHelper class with the code below. Amazon Polly is not available in every region for now. region parameter is used for setting the region that will be used. synthesize method creates an audio stream for the text in the output format requested.
public class PollyHelper {

       private final AmazonPollyClient polly;
       private final String voiceId = "Joanna";
      
       public PollyHelper(Region region) {
             // create an Amazon Polly client in a specific region
             polly = new AmazonPollyClient();
             polly.setRegion(region);
       }

       public InputStream synthesize(String text, OutputFormat format) throws IOException {
             SynthesizeSpeechRequest synthReq = new SynthesizeSpeechRequest().withText(text).withVoiceId(voiceId).withOutputFormat(format);
            
             SynthesizeSpeechResult synthRes = polly.synthesizeSpeech(synthReq);

             return synthRes.getAudioStream();
       }
}

4. Generate audio with Amazon Polly
Create AudioController class as below to return a MP3 file after generating it by Amazon Polly. I have used eu-west-1 region as it is the only region in Europe that Polly is available as of today.
@Controller
public class AudioController {

       @RequestMapping(path="/audio", produces="audio/mpeg3")
       public @ResponseBody byte[] textToSpeech(@RequestParam("msg") String msg) throws IOException {
                    PollyHelper helper = new PollyHelper(Region.getRegion(Regions.EU_WEST_1));
                   
                    InputStream is = helper.synthesize(msg, OutputFormat.Mp3);
                   
                    return StreamUtils.copyToByteArray(is);
       }
}
5. Change the dashboard to play audio notifications
Add the functions below to the dashboard.jsp file. initNotifications method creates an EventSource object that make a request to /feed url. This request creates a notification channel from server to the browser. When a notification sent from the server,  EventSource object calls the message handler. In this code, we add an event listener for the 'cardSold' event. When a 'cardSold' event is sent from the server, the text for the notification is shown in the notification area and the audio is played with the playAudio method. Audio will be requested from /audio url, which corresponds to AudioController.textToSpeech method that is created in the previous step.
function messageForData(data) {
       return "Your card, " + data.name + ", has been bought by " + data.newOwner + ", for " + data.price + "$";
}
      
function playAudio(data) {
       var audio = new Audio('audio?msg=' + messageForData(data));
       audio.play();
}
      
function animateSpeaker() {
       $('#speaker').fadeTo("slow", 0.15).delay(400).fadeTo("slow", 1).delay(400).fadeTo("slow", 0.15).delay(400).fadeTo("slow", 1).delay(400).fadeTo("slow", 0.15).delay(400).fadeTo("slow", 1);
}
      
function setNotification(data) {
       var msg = messageForData(data);
       var spn$ = $('<span/>').html(msg + '&nbsp;<img id="speaker" src="images/speaker.png"/>');
       $('#notifications').empty().append(spn$);
       animateSpeaker();
}
      
function processCardSoldEvent(data) {
       setBalance(data.oldOwnerBalance);
       setNotification(data);
       playAudio(data);
}
      
function initNotifications() {
       if (typeof (EventSource) !== "undefined") {
             var source = new EventSource("/feed");
             source.addEventListener('cardSold', function(event) {
                    var data = JSON.parse(event.data);
                    processCardSoldEvent(data);
             });
       }
}

After creating the functions we can use them as below. initNotifications method is called to initialize the notification functionality when the dashboard page is loaded.
<body onload="initNotifications()">
       <div id="notif-container">
             <div id="notif-title">
                    <img src="images/notification.png"/>
                    <span>Notifications</span>
             </div>
             <div id="notifications"></div>
       </div>

After completing the application, you can use it as shown in the video below.



Other Considerations
In this post, I have used a fixed language and voice. In real applications, we can enable users to select their preferred language and the voice they like.
Also I have developed a generic endpoint to generate audio for any text sent by the browser for simplicity. In production, the audio should be prepared and possibly cached in the server to prevent uncontrolled Polly usage.
For simplicity, I have used Server Sent Events for real-time notification. SSE is not supported by IE, please check browser support here.
I want to remind one more thing about SSE. SseEmitter objects are valid only in the JVM they are created. If we use multiple EC2 instances, SseEmitter may not be in the JVM that processes the buyCard request. In this post, I have used one instance for simplicity. In production, each EC2 instance that establish the SseEmitter connection would subscribe to a topic specific to a user.

Summary
In this post, I have developed a notification functionality with Server Sent Events and Amazon Polly to notify users with a natural speech.
The code can be found here.
To read more about AWS services, stay tuned.