8000 interrupt speech with button click by RomanLut · Pull Request #9 · akdeb/ElatoAI · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

interrupt speech with button click #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

RomanLut
Copy link
Contributor
@RomanLut RomanLut commented Jun 5, 2025

This PR should allow speech to be interrupted via a button click.
Unfortunately, I haven't been able to verify it, as I haven't compiled the server.

@akdeb
Copy link
Owner
akdeb commented Jun 6, 2025

Thanks for submitting the PR @RomanLut! This is one of the items on the roadmap.

Let me try this. Previously the problem I faced in implementing interrupt was -- the audio bytes (from OpenAI) were still on the way to the ESP32 when the user pressed the button. So it doesn't switch back to listening clearly and the audio it hears (from the user) is murky/sped up.

Example case:

Server sends: "Hey! How can I help interrupt you today?" [User only hears "Hey! How can I help"]
User says: "What's the weather like in Baku, Azerbaijan?"
OpenAI hears: gibberish ..... like in Baku, Azerbaijan?

So there was a processing conflict while the "you today?" was still on the way to the ESP32 over the websocket and while it registered the overlapping audio from the user "What's the weather".

@akdeb akdeb self-requested a review June 6, 2025 10:40
@RomanLut
Copy link
Contributor Author
RomanLut commented Jun 6, 2025

In this PR, the behavior is as follows:

When the user clicks the button, the client sends the following JSON to the server:
{"type": "instruction", "msg": "INTERRUPT", "audio_end_ms": 1000}
The server already contains partial support for handling the interruption.

I also added the following command on server:

client.realtime.send("response.cancel", {
    type: "response.cancel",
    event_id: RealtimeUtils.generateId("evt_")
});

This should instruct the LLM to stop audio generation.

As a result, I expect the speaking to stop after a brief delay, and the client should return to listening mode as usual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0