How can I enable barge-in for my ASR scenario?

Generally, barge-in means that the user can speak already before a TTS output is finished, thereby interrupting the TTS. The method depends on the ASR mode.

  • In push-to-talk mode, your application needs to listen for the ASR event that signals that the PTT button was pressed. There is both a connector message and a SiAM-dp DeviceStateChanged client notification available for that purpose. When this signal is received, the application should cancel all TTS output.
  • In speak-to-activate mode, your application similarly needs to listen for ASR event that signals speech, and cancel all TTS output in that case. An additional challenge in this mode is to prevent the TTS to trigger ASR events if loudspeakers are used (echo). You will need to add acoustic echo cancellation (AEC) for that. Audio Manager provides the Voice Capture DSP for that purpose, which you can add to your configuration (see “Voice Capture DSP” in section “Common Plug-ins” of the Audio Manager documentation for details).

Category: Speech / Audio
Tags: ,

← How can I enable barge-in for my ASR scenario?