8000 [Help] Custom Audio & Voice using LLM · Issue #312 · react-chatbotify/react-chatbotify · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Help] Custom Audio & Voice using LLM #312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KillerShoaib opened this issue May 12, 2025 · 5 comments
Closed

[Help] Custom Audio & Voice using LLM #312

KillerShoaib opened this issue May 12, 2025 · 5 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@KillerShoaib
Copy link

Hey @tjtanjin it's me again. I've already created an issue earlier and it was solved (thanks to you). We're using chatbotify for our organization. We're developing a RAG system and the chatbotify is being used for the chatbubble. I've already implemented everything text related using chatbotify. Now the next challenge is to incorporate Audio-voice feature too. I've searched the entire documentation and went through the codebase to see if there is a way we can implement a custom voice and audio feature. Here what I want to acheive

  1. mic button will record the audio and the audio will be send to a backend, in the backend the audio will be pass to an LLM and it'll transcribe it and return response in text. I've implemented the backend using FastAPI but I was searching how can I send the captured audio from chatbotify to the backend?
  2. When the audio toggle is on then the audio will be coming from the backend (gemini, openai LLM) , that audio will be played in the chatbotify.

So far from my understanding (tho I'm quite new to react) in the code or documentation I couldn't find to integrate this. Is there anyway I can implement this if not directly but using custom hooks or components or something.

TIA.

@KillerShoaib KillerShoaib added the help wanted Extra attention is needed label May 12, 2025
@tjtanjin
Copy link
Member

Hey @KillerShoaib, the core library currently relies on browser built-in features for both voice and audio. It also doesn't expose any settings that will allow for a drop-in replacement for customized voice and audio. It's likely that even if custom solutions are introduced, they will come in the form of plugins.

On that note, the LLM Connector Plugin was just released:

You may find its codebase useful for reference.

Off the top of my head, I imagine you'll have to implement custom buttons for voice/audio, then rely on the useAudio and useTextArea hooks. These will be used to speak out the text (if you are ok with the built-in audio) and to set the text area value after transcribing on the backend.

@KillerShoaib
Copy link
Author

Hey @KillerShoaib, the core library currently relies on browser built-in features for both voice and audio. It also doesn't expose any settings that will allow for a drop-in replacement for customized voice and audio. It's likely that even if custom solutions are introduced, they will come in the form of plugins.

On that note, the LLM Connector Plugin was just released:

You may find its codebase useful for reference.

Off the top of my head, I imagine you'll have to implement custom buttons for voice/audio, then rely on the useAudio and useTextArea hooks. These will be used to speak out the text (if you are ok with the built-in audio) and to set the text area value after transcribing on the backend.

Hey thanks @tjtanjin I was able to solve the initial part of STT. Here is my approach (inspired from you)

  1. I'm creating a new button in the text area (mic button)
  2. Then I'm manually recording the mic audio
  3. After user clicking the mic again the recording is stop and the audio is send to backend api
  4. Meanwhile the textbox is toggle off till the response is coming
  5. Then, after getting the response using setTextAreaValue to insert the transcribe message in the text field and toggle back the text area

So far able to do this and working on the TTS part now will update later. But one issue I'm having is the custom button. I can't change the custom button between svg. Means initially I can see the right button icon, but that's it, Upon clicking and changing value nothing happens (means no changes of the icon). Here is a below code of the button component

Button.tsx

export const MicrophoneButton: React.FC<MicrophoneButtonProps> = ({
  isRecording,
  onClick,
  isDisabled = false,
}) => {
  const handleClick = () => {
    if (!isDisabled) {
      onClick();
    }
  };

  return (
    <div
      role="button"
      tabIndex={isDisabled ? -1 : 0} // Make it focusable only when enabled
      onClick={handleClick}
      style={{
        ...baseStyle,
        ...(isDisabled ? disabledStyle : enabledStyle),
      }}
      aria-label={isRecording ? "Stop recording" : "Start recording"}
      data-testid="rcb-microphone-button" // Optional: for testing
    >
      <span className="rcb-custom-mic-icon" data-testid="rcb-custom-mic-icon-span">
        {isRecording ? (
          // Mic Mute icon (recording is active, button shows "stop/mute")
          <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
            <path d="M13 8c0 .564-.094 1.107-.266 1.613l-.814-.814A4 4 0 0 0 12 8V7a.5.5 0 0 1 1 0zm-5 4c.818 0 1.578-.245 2.212-.667l.718.719a5 5 0 0 1-2.43.923V15h3a.5.5 0 0 1 0 1h-7a.5.5 0 0 1 0-1h3v-2.025A5 5 0 0 1 3 8V7a.5.5 0 0 1 1 0v1a4 4 0 0 0 4 4m3-9v4.879l-1-1V3a2 2 0 0 0-3.997-.118l-.845-.845A3.001 3.001 0 0 1 11 3"/>
            <path d="m9.486 10.607-.748-.748A2 2 0 0 1 6 8v-.878l-1-1V8a3 3 0 0 0 4.486 2.607m-7.84-9.253 12 12 .708-.708-12-12z"/>
          </svg>
        ) : (
          // Standard Microphone Icon (recording is stopped, button shows "start")
          <svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" viewBox="0 0 24 24">
             <path d="M12 14c1.66 0 3-1.34 3-3V5c0-1.66-1.34-3-3-3S9 3.34 9 5v6c0 1.66 1.34 3 3 3zm5.3-3c0 3-2.54 5.1-5.3 5.1S6.7 14 6.7 11H5c0 3.41 2.72 6.23 6 6.72V21h2v-3.28c3.28-.49 6-3.31 6-6.72h-1.7z"/>
          </svg>
        )}
      </span>
    </div>
  );
}; 

And then passing the button to the parameter and before that creating a hook to track of recording

// the hook which is tracking the recording state
const [shouldRecord, setShouldRecord] = useState(false);

// passing the button
chatInput:{
        buttons: [
          <MicrophoneButton
            key="mic-button"
            isRecording={shouldRecord}
            onClick={() => setShouldRecord(prev => !prev)}
            isDisabled={isProcessingAudio || (shouldRecord && status !== 'idle' && status !== 'stopped' && status !=='recording')}
          />,
          Button.SEND_MESSAGE_BUTTON
        ]
      }

I can see the button but I can't toggle it on/off. The button remains the same. Even though recording is working (that mean the shouldRecord is returning the correct value otherwise the recording won't work cuz it uses the same variable to initiate the recording) but the icon is not updating?

Am I doing something wrong here? or We can't update a custom button.

TIA

@tjtanjin
Copy link
Member
tjtanjin commented May 15, 2025 8000

Hey @KillerShoaib, the custom button is inside settings which is essentially a configuration object (not part of the JSX tree). When you update the shouldRecord state , it doesn't re-create the button because settings isn’t being re-evaluated. You have a few options from here:

  • Use context (triggers re-renders in all consumers)
  • Force settings to re-create the button with an updateSettings hook (this feels like a hack)
  • Maintain a state within your MicrophoneButton just for the visual updates (probably easiest and should work for your case)

On a separate note, I'm not sure how far you're generalizing the STT and TTS solutions. It's definitely something I hope to make a plugin out of in time to come - and if it so happens your solution can be tapped on and it's sitting somewhere in public, feel free to share it (or even publish it as a plugin of your own) 😛

@KillerShoaib
Copy link
Author

Hey @KillerShoaib, the custom button is inside settings which is essentially a configuration object (not part of the JSX tree). When you update the shouldRecord state , it doesn't re-create the button because settings isn’t being re-evaluated. You have a few options from here:

  • Use context (triggers re-renders in all consumers)
  • Force settings to re-create the button with an updateSettings hook (this feels like a hack)
  • Maintain a state within your MicrophoneButton just for the visual updates (probably easiest and should work for your case)

On a separate note, I'm not sure how far you're generalizing the STT and TTS solutions. It's definitely something I hope to make a plugin out of in time to come - and if it so happens your solution can be tapped on and it's sitting somewhere in public, feel free to share it (or even publish it as a plugin of your own) 😛

Hey @tjtanjin sorry for the late reply again.

I've fixed the button issue using context. Now the button is working properly. I've also finished the TTS part. Though it is kinda hack not good implementation. Here is a breakdown of it:

  1. I'm creating a websocket connection to the backend. The backend will send continuous audio chunks in base64 format.
  2. In the frontend I'm just playing the base64 data to audio.
  3. When the tts button is on then based on the condition websocket connection is established.
  4. In the loop I'm calling call_local_endpoint where call_local_endpoint basically inject the message and then send the message for tts which will start playing the audio.
  5. Here is a bit of code:
// call local endpoint function
const call_local_endpoint = useCallback(async (params: Params) => {
    try {
      const data = await callChatApi({
        message: params.userInput,
        userId,
        sessionId,
        deviceId,
        endpoint: CONFIG.CHAT_ENDPOINT
      });
      await params.injectMessage(data.response); //injecting the message
      setHasError(false); 
      if (isTTSEnabled && data.response) {
        playTTS(data.response); // if tts is enable then start playing the audio
      }
    } catch (error) {
      await params.injectMessage("Unable to connect to endpoint. Server is not running or setup already.");
      setHasError(true);
    }
  }, [userId, sessionId, deviceId, setHasError, isTTSEnabled, playTTS]);

//then calling it inside the loop
loop: {
      message: call_local_endpoint,
      path: () => {
        if (hasError) {
          return "retry";
        }
        return "loop";
      },
      renderMarkdown: ["BOT", "USER"]
    },

This is the hacky implementation I've done. I hardly know react (or js, I'm mainly a python/AI dev) so I've heavily used cursor to produce this solution. Not optimal but for now it is doing the job. I'm sorry since I'm not good at react therefore I think my implementation is not good enough to use as plugin :(

@tjtanjin
Copy link
Member

No worries! Glad you got things to work. Doesn't seem too hacky to be honest 😛 STT and TTS customizations are not a high priority at the moment, though when I do get to them I may also relook a bit at how the core library is currently designed - mainly to see if tweaks should be made for easier customizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants
0